YuLab-SMU / MicrobiotaProcess

:microbe: A comprehensive R package for deep mining microbiome
https://www.sciencedirect.com/science/article/pii/S2666675823000164
177 stars 37 forks source link

Error of mp_import_metaphlan #88

Open alienzj opened 1 year ago

alienzj commented 1 year ago

When I run mp_import_metaphlan to import metaphlan4 profile, it occur error like below:

mpse <- mp_import_metaphlan(profile = mpa4_profile_path, mapfilename = metadata_path)
Warning: non-unique values when setting 'row.names': ‘s__un_c__Bacilli’, ‘s__un_c__Clostridia’, ‘s__un_f__Eggerthellaceae’, ‘s__un_f__Fusobacteriaceae’, ‘s__un_f__Lachnospiraceae’, ‘s__un_f__Rikenellaceae’, ‘s__un_f__Ruminococcaceae’, ‘s__un_o__Bacteroidales’, ‘s__un_o__Clostridiales’Error in `.rowNamesDF<-`(x, value = value) : 
  duplicate 'row.names' are not allowed

Then I checked the metaphlan4 profile:

➤ awk '{print $1}' tables/taxonomic_profiles/202304/mpa4_profile_format.tsv | awk -F\| '{print $7}' | wc -l
1801
jiezhu@rsaa:~/p/M/a/results|main⚡*?
➤ awk '{print $1}' tables/taxonomic_profiles/202304/mpa4_profile_format.tsv | awk -F\| '{print $7}' | sort | uniq | wc -l
1801

It seems like there is no duplicated species name for mpa4_profile_format.tsv. So I am little comfused about the error rasied by mp_import_metaphlan.

Appreciated advance for any help!

alienzj commented 1 year ago

Some clade name of Metaphlan4 profile like below:

s__Clostridia_unclassified_SGB14196
s__Clostridia_bacterium_UC5_1_1E11
s__Clostridiales_bacterium_CHKCI006
s__Clostridiales_bacterium_Marseille_P5551
s__Clostridia_unclassified_SGB4313
s__Clostridia_unclassified_SGB13999
s__Clostridia_unclassified_SGB13972
s__Clostridia_unclassified_SGB14297
s__Clostridiales_bacterium_KA00274
s__Clostridia_unclassified_SGB6317
s__Clostridia_unclassified_SGB14308
alienzj commented 1 year ago

I checked the taxatab variable generated by mp_import_metaphlan inside. Seems the details showed there is a issue when parse Metaphlan4 profile: image

Is there a way to solve this issue quickly? @xiangpin

xiangpin commented 1 year ago

It seems that the MetaPhlan4 updated the reference database including some unknown (uSGBs) microbial species. I might need the output file of the MetaPhlan4 to debug.

alienzj commented 1 year ago

metaphlan4.merged.abundance.profile.all.tsv.gz I see, you can use attached one.

xiangpin commented 1 year ago

The file can be parsed with 1.10.3 and devel version. I will generate an output of MetaPhlan4 to test again.

> library(MicrobiotaProcess)
MicrobiotaProcess v1.10.3 For help:
https://github.com/YuLab-SMU/MicrobiotaProcess/issues

If you use MicrobiotaProcess in published research, please cite the
paper:

Shuangbin Xu, Li Zhan, Wenli Tang, Qianwen Wang, Zehan Dai, Land Zhou,
Tingze Feng, Meijun Chen, Tianzhi Wu, Erqiang Hu, Guangchuang Yu.
MicrobiotaProcess: A comprehensive R package for deep mining
microbiome. The Innovation. 2023, 100388. doi:
10.1016/j.xinn.2023.100388

Export the citation to BibTex by citation('MicrobiotaProcess')

This message can be suppressed by:
suppressPackageStartupMessages(library(MicrobiotaProcess))
> mpse <- mp_import_metaphlan('./metaphlan4.merged.abundance.profile.all.tsv')
> mpse
# A MPSE-tibble (MPSE object) abstraction: 1,840 × 11
# OTU=92 | Samples=20 | Assays=Abundance | Taxonomy=Kingdom, Phylum, Class, Order, Family, Genus, Speies
   OTU      Sample Abundance clade_taxid Kingdom Phylum Class Order Family Genus
   <chr>    <chr>      <dbl> <chr>       <chr>   <chr>  <chr> <chr> <chr>  <chr>
 1 t__SGB6… hCom1…      3.03 2|1239|909… k__Bac… p__Fi… c__N… o__V… f__Ve… g__V…
 2 t__SGB1… hCom1…      2.02 2|976|2006… k__Bac… p__Ba… c__B… o__B… f__Ta… g__P…
 3 t__SGB1… hCom1…      1.86 2|976|2006… k__Bac… p__Ba… c__B… o__B… f__un… g__P…
 4 t__SGB5… hCom1…      1.80 2|1239|909… k__Bac… p__Fi… c__N… o__V… f__Ve… g__D…
 5 t__SGB1… hCom1…      1.70 2|201174|1… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
 6 t__SGB8… hCom1…      1.67 2|1239|910… k__Bac… p__Fi… c__B… o__L… f__Ca… g__G…
 7 t__SGB1… hCom1…      1.67 2|201174|8… k__Bac… p__Ac… c__C… o__C… f__At… g__O…
 8 t__SGB1… hCom1…      1.65 2|201174|8… k__Bac… p__Ac… c__C… o__E… f__Eg… g__S…
 9 t__SGB7… hCom1…      1.64 2|1239|910… k__Bac… p__Fi… c__B… o__L… f__La… g__L…
10 t__SGB5… hCom1…      1.56 2|1239|909… k__Bac… p__Fi… c__N… o__A… f__Ac… g__A…
# ℹ 1,830 more rows
# ℹ 1 more variable: Speies <chr>
# ℹ Use `print(n = ...)` to see more rows
> library(MicrobiotaProcess)
MicrobiotaProcess v1.11.4.993 For help:
https://github.com/YuLab-SMU/MicrobiotaProcess/issues

If you use MicrobiotaProcess in published research, please cite the
paper:

Shuangbin Xu, Li Zhan, Wenli Tang, Qianwen Wang, Zehan Dai, Lang Zhou,
Tingze Feng, Meijun Chen, Tianzhi Wu, Erqiang Hu, Guangchuang Yu.
MicrobiotaProcess: A comprehensive R package for deep mining
microbiome. The Innovation. 2023, 4(2):100388. doi:
10.1016/j.xinn.2023.100388

Export the citation to BibTex by citation('MicrobiotaProcess')

This message can be suppressed by:
suppressPackageStartupMessages(library(MicrobiotaProcess))
> mpse <- mp_import_metaphlan('./metaphlan4.merged.abundance.profile.all.tsv')
> mpse
# A MPSE-tibble (MPSE object) abstraction: 1,840 × 11
# OTU=92 | Samples=20 | Assays=Abundance | Taxonomy=Kingdom, Phylum, Class, Order, Family, Genus, Speies
   OTU      Sample Abundance clade_taxid Kingdom Phylum Class Order Family Genus
   <chr>    <chr>      <dbl> <chr>       <chr>   <chr>  <chr> <chr> <chr>  <chr>
 1 t__SGB6… hCom1…      3.03 2|1239|909… k__Bac… p__Fi… c__N… o__V… f__Ve… g__V…
 2 t__SGB1… hCom1…      2.02 2|976|2006… k__Bac… p__Ba… c__B… o__B… f__Ta… g__P…
 3 t__SGB1… hCom1…      1.86 2|976|2006… k__Bac… p__Ba… c__B… o__B… f__un… g__P…
 4 t__SGB5… hCom1…      1.80 2|1239|909… k__Bac… p__Fi… c__N… o__V… f__Ve… g__D…
 5 t__SGB1… hCom1…      1.70 2|201174|1… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
 6 t__SGB8… hCom1…      1.67 2|1239|910… k__Bac… p__Fi… c__B… o__L… f__Ca… g__G…
 7 t__SGB1… hCom1…      1.67 2|201174|8… k__Bac… p__Ac… c__C… o__C… f__At… g__O…
 8 t__SGB1… hCom1…      1.65 2|201174|8… k__Bac… p__Ac… c__C… o__E… f__Eg… g__S…
 9 t__SGB7… hCom1…      1.64 2|1239|910… k__Bac… p__Fi… c__B… o__L… f__La… g__L…
10 t__SGB5… hCom1…      1.56 2|1239|909… k__Bac… p__Fi… c__N… o__A… f__Ac… g__A…
# ℹ 1,830 more rows
# ℹ 1 more variable: Speies <chr>
# ℹ Use `print(n = ...)` to see more rows
>
shaodongyan commented 1 year ago

But mp_import_metaphlan can't work with tree "mpa_vJan21_CHOCOPhlAnSGB_202103.nwk" Warning messages: 1: The number of features in otu table is not equal the number of tips in otu tree. • The same features will be extract automatically ! 2: In (function (phy, tip, trim.internal = TRUE, subtree = FALSE, root.edge = 0, : drop all tips of the tree: returning NULL 3: In (function (phy, tip, trim.internal = TRUE, subtree = FALSE, root.edge = 0, : drop all tips of the tree: returning NULL

Brynnhildr commented 1 year ago

Hello, have you resolved this issue yet? I am currently encountering the same problem. My tree file is "mpa_vOct22_CHOCOPhlAnSGB_202212.nwk",and my metaphlan version is MetaPhlan4.I am getting an error when using mp_import_metaphlan as follows: Warning messages: 1: The number of features in otu table is not equal the number of tips in otu tree. • The same features will be extract automatically ! 2: In drop.tip.phylo(phy = list(edge = c(30212L, 30213L, 30213L, 30214L, : drop all tips of the tree: returning NULL 3: In drop.tip.phylo(collapse.singles = FALSE, phy = list(edge = c(787, : drop all tips of the tree: returning NULL