franciscozorrilla / metaGEM

:gem: An easy-to-use workflow for generating context specific genome-scale metabolic models and predicting metabolic interactions within microbial communities directly from metagenomic data
https://franciscozorrilla.github.io/metaGEM/
MIT License
203 stars 42 forks source link

Error when running compositionVis task #71

Closed zpf0117b closed 3 years ago

zpf0117b commented 3 years ago

Hi Francisco,

I ran the compositionVis task on the toy dataset you provide with the command bash metaGEM.sh -j 8 -m 32 -c 24 -h 20 -t compositionVis -l, but got error message as follows.

cat: sample1//classify/summary.tsv: No such file or directory cat: sample3//classify/summary.tsv: No such file or directory During startup - Warning message: Setting LC_CTYPE failed, using "C" Error in library(tidyverse) : there is no package called 'tidyverse' Execution halted [Mon Jul 26 18:43:18 2021] Error in rule compositionVis: jobid: 0 output: /data/main/metaGEM/stats/compositionVis.pdf shell: ...

It seems this error was caused by 2 reasons.

  1. There were no such files in forms of *summary.tsv in the folders GTDBTk/sample1/classify/ or GTDBTk/sample3/classify/. I did finish the task gtdbtk successfully. At present the following command shows

    >>> ls -l GTDBTk/sample1/classify/ drwxrwxr-x. 3 main main 73 Jul 24 13:34 intermediate_results >>> ls -l GTDBTk/sample3/classify/ drwxrwxr-x. 3 main main 73 Jul 24 13:58 intermediate_results

And the files in the folder GTDBTk/sample2/classify/ was OK. Maybe the task of gtdbtk went something wrong?

>>> ls -l GTDBTk/sample2/classify/ -rw-rw-r--. 1 main main 115183 Jul 24 14:24 gtdbtk.ar122.classify.tree -rw-rw-r--. 1 main main 3934 Jul 24 14:24 gtdbtk.ar122.summary.tsv drwxrwxr-x. 3 main main 167 Jul 24 14:24 intermediate_results

>>> ls -l GTDBTk/sample2/ drwxrwxr-x. 3 main main 265 Jul 24 14:23 align drwxrwxr-x. 3 main main 116 Jul 24 14:24 classify lrwxrwxrwx. 1 main main 35 Jul 24 14:24 gtdbtk.ar122.classify.tree -> classify/gtdbtk.ar122.classify.tree lrwxrwxrwx. 1 main main 31 Jul 24 14:23 gtdbtk.ar122.filtered.tsv -> align/gtdbtk.ar122.filtered.tsv lrwxrwxrwx. 1 main main 41 Jul 24 14:22 gtdbtk.ar122.markers_summary.tsv -> identify/gtdbtk.ar122.markers_summary.tsv lrwxrwxrwx. 1 main main 28 Jul 24 14:23 gtdbtk.ar122.msa.fasta -> align/gtdbtk.ar122.msa.fasta lrwxrwxrwx. 1 main main 33 Jul 24 14:24 gtdbtk.ar122.summary.tsv -> classify/gtdbtk.ar122.summary.tsv lrwxrwxrwx. 1 main main 33 Jul 24 14:23 gtdbtk.ar122.user_msa.fasta -> align/gtdbtk.ar122.user_msa.fasta lrwxrwxrwx. 1 main main 32 Jul 24 14:23 gtdbtk.bac120.filtered.tsv -> align/gtdbtk.bac120.filtered.tsv lrwxrwxrwx. 1 main main 42 Jul 24 14:22 gtdbtk.bac120.markers_summary.tsv -> identify/gtdbtk.bac120.markers_summary.tsv lrwxrwxrwx. 1 main main 29 Jul 24 14:23 gtdbtk.bac120.msa.fasta -> align/gtdbtk.bac120.msa.fasta lrwxrwxrwx. 1 main main 34 Jul 24 14:23 gtdbtk.bac120.user_msa.fasta -> align/gtdbtk.bac120.user_msa.fasta lrwxrwxrwx. 1 main main 34 Jul 24 14:22 gtdbtk.failed_genomes.tsv -> identify/gtdbtk.failed_genomes.tsv -rw-rw-r--. 1 main main 6014 Jul 24 14:47 gtdbtk.log lrwxrwxrwx. 1 main main 45 Jul 24 14:22 gtdbtk.translation_table_summary.tsv -> identify/gtdbtk.translation_table_summary.tsv -rw-rw-r--. 1 main main 0 Jul 24 14:21 gtdbtk.warnings.log drwxrwxr-x. 3 main main 216 Jul 24 14:22 identify

>>> ls -l GTDBTk/sample1/ drwxrwxr-x. 3 main main 155 Jul 24 13:34 align drwxrwxr-x. 3 main main 42 Jul 26 17:48 classify lrwxrwxrwx. 1 main main 41 Jul 24 13:33 gtdbtk.ar122.markers_summary.tsv -> identify/gtdbtk.ar122.markers_summary.tsv lrwxrwxrwx. 1 main main 32 Jul 24 13:34 gtdbtk.bac120.filtered.tsv -> align/gtdbtk.bac120.filtered.tsv lrwxrwxrwx. 1 main main 42 Jul 24 13:33 gtdbtk.bac120.markers_summary.tsv -> identify/gtdbtk.bac120.markers_summary.tsv lrwxrwxrwx. 1 main main 29 Jul 24 13:34 gtdbtk.bac120.msa.fasta -> align/gtdbtk.bac120.msa.fasta lrwxrwxrwx. 1 main main 34 Jul 24 13:34 gtdbtk.bac120.user_msa.fasta -> align/gtdbtk.bac120.user_msa.fasta lrwxrwxrwx. 1 main main 34 Jul 24 13:33 gtdbtk.failed_genomes.tsv -> identify/gtdbtk.failed_genomes.tsv -rw-rw-r--. 1 main main 4220 Jul 24 13:57 gtdbtk.log lrwxrwxrwx. 1 main main 45 Jul 24 13:33 gtdbtk.translation_table_summary.tsv -> identify/gtdbtk.translation_table_summary.tsv -rw-rw-r--. 1 main main 0 Jul 24 13:33 gtdbtk.warnings.log drwxrwxr-x. 3 main main 216 Jul 24 13:33 identify

  1. A library tidyverse of R language was missing. I've never used R as programming language before, so I don't know how to install it while other libraries of R language (for example, ggplot2) can be used directly after I set up the enviorment of metaGEM.

Could you tell me how to fix the error?

zpf0117b commented 3 years ago

The log of task gtdbtk using the command bash metaGEM.sh -j 4 -m 32 -c 4 -t gtdbtk -l showed:

Unlocking snakemake ... Unlocking working directory.

Dry-running snakemake jobs ... Building DAG of jobs... Job stats: job count min threads max threads


all 1 1 1 total 1 1 1

[Tue Jul 27 02:55:42 2021] Job 0: WARNING: Be very careful when adding/removing any lines above this message. The metaGEM.sh parser is presently hardcoded to edit line 22 of this Snakefile to expand target rules accordingly, therefore adding/removing any lines before this message will likely result in parser malfunction.

Job stats: job count min threads max threads


all 1 1 1 total 1 1 1

This was a dry-run (flag -n). The order of jobs does not reflect the order of execution. Do you wish to submit this batch of jobs on your local machine? (y/n)y Building DAG of jobs... Using shell: /bin/bash Provided cores: 1 (use --cores to define parallelism) Rules claiming more threads will be scaled down. Job stats: job count min threads max threads


all 1 1 1 total 1 1 1

Select jobs to execute... Building DAG of jobs... Using shell: /bin/bash Provided cores: 1 (use --cores to define parallelism) Rules claiming more threads will be scaled down. Job stats: job count min threads max threads


all 1 1 1 total 1 1 1

Select jobs to execute...

[Tue Jul 27 02:55:44 2021] Job 0: WARNING: Be very careful when adding/removing any lines above this message. The metaGEM.sh parser is presently hardcoded to edit line 22 of this Snakefile to expand target rules accordingly, therefore adding/removing any lines before this message will likely result in parser malfunction.

[Tue Jul 27 02:55:44 2021] Finished job 0. 1 of 1 steps (100%) done Complete log: /data/main/metaGEM/.snakemake/log/2021-07-27T025544.096465.snakemake.log

franciscozorrilla commented 3 years ago

Hi @zpf0117b,

Thanks for brining this bug to my attention, indeed line 1 of the script is trying to load the tidyverse R package which is not actually a dependency in the metagem env. Just FYI, the tidyverse includes dplyr and ggplot2.

https://github.com/franciscozorrilla/metaGEM/blob/241678ce8e3d30f8d3ae7426075495665b167d84/scripts/compositionVis.R#L1

I have now fixed this (https://github.com/franciscozorrilla/metaGEM/commit/ee2fdea597742517a03b942dceb08e829fbd9277) so that the ggplot2 + dplyr packages are loaded. In your case you may also want to simply install the tidyverse package to avoid re-cloning or manually modifying scripts:

source activate metagem
conda install -c r r-tidyverse 

Regarding the GTDBTk jobs, it seems like they failed for samples 1 and 3. Could you share the log files for those jobs? You can find them in the logs/ subfolder:

ll logs/|grep -i gtdb
zpf0117b commented 3 years ago

Hi, @franciscozorrilla ,

Sadly there is nothing in the logs/ subfolder.

The full output message of GTDBTk jobs is

Setting current directory to root in config.yaml file ...

Parsing Snakefile to target rule: gtdbtk ...

Do you wish to continue with these parameters? (y/n)y Proceeding with gtdbtk job(s) ...

Please verify parameters set in the config.yaml file:

path: root: /data/main/metaGEM scratch: /tmp folder: data: dataset logs: logs assemblies: assemblies scripts: scripts crossMap: crossMap concoct: concoct maxbin: maxbin metabat: metabat refined: refined_bins reassembled: reassembled_bins classification: GTDBTk abundance: abundance GRiD: GRiD GEMs: GEMs SMETANA: SMETANA memote: memote qfiltered: qfiltered stats: stats proteinBins: protein_bins dnaBins: dna_bins pangenome: pangenome kallisto: kallisto kallistoIndex: kallistoIndex benchmarks: benchmarks scripts: kallisto2concoct: kallisto2concoct.py prepRoary: prepareRoaryInput.R binFilter: binFilter.py qfilterVis: qfilterVis.R assemblyVis: assemblyVis.R binningVis: binningVis.R modelVis: modelVis.R compositionVis: compositionVis.R taxonomyVis: taxonomyVis.R carveme: media_db.tsv toy: download_toydata.txt GTDBtkVis: cores: fastp: 8 megahit: 12 crossMap: 12 concoct: 12 metabat: 12 maxbin: 12 refine: 12 reassemble: 12 classify: 2 gtdbtk: 12 abundance: 12 carveme: 4 smetana: 12 memote: 4 grid: 12 prokka: 2 roary: 12 params: cutfasta: 10000 assemblyPreset: meta-sensitive assemblyMin: 1000 concoct: 800 metabatMin: 50000 seed: 420 minBin: 1500 refineMem: 1600 refineComp: 50 refineCont: 10 reassembleMem: 1600 reassembleComp: 50 reassembleCont: 10 carveMedia: M8 smetanaMedia: M1,M2,M3,M4,M5,M7,M8,M9,M10,M11,M13,M14,M15A,M15B,M16 smetanaSolver: CPLEX roaryI: 90 roaryCD: 90 envs: metagem: metagem metawrap: metawrap prokkaroary: prokkaroary

Please pay close attention to make sure that your paths are properly configured! Do you wish to proceed with this config.yaml file? (y/n)y

Unlocking snakemake ... Unlocking working directory.

Dry-running snakemake jobs ... Building DAG of jobs... Job stats: job count min threads max threads


all 1 1 1 total 1 1 1

[Wed Jul 28 00:27:56 2021] Job 0: WARNING: Be very careful when adding/removing any lines above this message. The metaGEM.sh parser is presently hardcoded to edit line 22 of this Snakefile to expand target rules accordingly, therefore adding/removing any lines before this message will likely result in parser malfunction.

Job stats: job count min threads max threads


all 1 1 1 total 1 1 1

This was a dry-run (flag -n). The order of jobs does not reflect the order of execution. Do you wish to submit this batch of jobs on your local machine? (y/n)y Building DAG of jobs... Using shell: /bin/bash Provided cores: 1 (use --cores to define parallelism) Rules claiming more threads will be scaled down. Job stats: job count min threads max threads


all 1 1 1 total 1 1 1

Select jobs to execute...

[Wed Jul 28 00:27:58 2021] Job 0: WARNING: Be very careful when adding/removing any lines above this message. The metaGEM.sh parser is presently hardcoded to edit line 22 of this Snakefile to expand target rules accordingly, therefore adding/removing any lines before this message will likely result in parser malfunction.

Gathering /data/main/metaGEM/GTDBTk/sample1 /data/main/metaGEM/GTDBTk/sample2 /data/main/metaGEM/GTDBTk/sample3 ... [Wed Jul 28 00:27:58 2021] Finished job 0. 1 of 1 steps (100%) done Complete log: /data/main/metaGEM/.snakemake/log/2021-07-28T002758.193068.snakemake.log

This message shows the log is stored in the /data/main/metaGEM/.snakemake/log/2021-07-28T002758.193068.snakemake.log file, and here is the content of this file:

Building DAG of jobs... Using shell: /bin/bash Provided cores: 1 (use --cores to define parallelism) Rules claiming more threads will be scaled down. Job stats: job count min threads max threads


all 1 1 1 total 1 1 1

Select jobs to execute...

[Wed Jul 28 00:27:58 2021] Job 0: WARNING: Be very careful when adding/removing any lines above this message. The metaGEM.sh parser is presently hardcoded to edit line 22 of this Snakefile to expand target rules accordingly, therefore adding/removing any lines before this message will likely result in parser malfunction.

[Wed Jul 28 00:27:58 2021] Finished job 0. 1 of 1 steps (100%) done Complete log: /data/main/metaGEM/.snakemake/log/2021-07-28T002758.193068.snakemake.log

franciscozorrilla commented 3 years ago

OK, I think I see what's going on. You are trying to run GTDBTk locally instead of submitting to the cluster scheduler, which is why you don't have any files in the logs/ folder. GTDBTk requres ~204 GB or RAM to run succesfully, so that is likely why your jobs are failing. You could try adding the --scratch_dir <dir> flag to the GTDBTk call in the Snakefile on this line:

https://github.com/franciscozorrilla/metaGEM/blob/ee2fdea597742517a03b942dceb08e829fbd9277/Snakefile#L1330

However, I would recommend keeping the Snakefile as is and submitting GTDBTk jobs to the cluster instead of running locally.

Also note that when running locally, only one job will be submitted at a time, and you do not need to specify the number of jobs (-j), cores (-c), or RAM (-m), as the last two parameters are only used for cluster job submissions. To modify the number of cores used by the jobs locally you need to modify the appropriate fields in the config.yaml file. Local jobs will use all RAM available to them.

Let me know if this helps or if you have additional questions!

zpf0117b commented 3 years ago

Hi, @franciscozorrilla ,

Seems like GTDBTk works smoothly, here is the result in the GTDBTk

> ls /data/main/metaGEM/GTDBTk/sample1 align gtdbtk.ar122.markers_summary.tsv gtdbtk.bac120.filtered.tsv gtdbtk.bac120.msa.fasta gtdbtk.bac120.user_msa.fasta gtdbtk.log gtdbtk.warnings.log classify gtdbtk.bac120.classify.tree gtdbtk.bac120.markers_summary.tsv gtdbtk.bac120.summary.tsv gtdbtk.failed_genomes.tsv gtdbtk.translation_table_summary.tsv identify

> ls /data/main/metaGEM/GTDBTk/sample2 align gtdbtk.ar122.filtered.tsv gtdbtk.ar122.summary.tsv gtdbtk.bac120.filtered.tsv gtdbtk.bac120.summary.tsv gtdbtk.log identify classify gtdbtk.ar122.markers_summary.tsv gtdbtk.ar122.user_msa.fasta gtdbtk.bac120.markers_summary.tsv gtdbtk.bac120.user_msa.fasta gtdbtk.translation_table_summary.tsv gtdbtk.ar122.classify.tree gtdbtk.ar122.msa.fasta gtdbtk.bac120.classify.tree gtdbtk.bac120.msa.fasta gtdbtk.failed_genomes.tsv gtdbtk.warnings.log

> ls /data/main/metaGEM/GTDBTk/sample3 align gtdbtk.ar122.markers_summary.tsv gtdbtk.bac120.filtered.tsv gtdbtk.bac120.msa.fasta gtdbtk.bac120.user_msa.fasta gtdbtk.log gtdbtk.warnings.log classify gtdbtk.bac120.classify.tree gtdbtk.bac120.markers_summary.tsv gtdbtk.bac120.summary.tsv gtdbtk.failed_genomes.tsv gtdbtk.translation_table_summary.tsv identify

> ls /data/main/metaGEM/GTDBTk/sample1/classify/ gtdbtk.bac120.classify.tree gtdbtk.bac120.summary.tsv intermediate_results

> ls /data/main/metaGEM/GTDBTk/sample2/classify/ gtdbtk.ar122.classify.tree gtdbtk.ar122.summary.tsv gtdbtk.bac120.classify.tree gtdbtk.bac120.summary.tsv intermediate_results

> ls /data/main/metaGEM/GTDBTk/sample3/classify/ gtdbtk.bac120.classify.tree gtdbtk.bac120.summary.tsv intermediate_results

However, there came out another error of compositionVis job after I rename the file GTDBTk.stats to GTDBtk.stats in order to fix the error: In file(file, "rt"): cannot open file 'GTDBtk.stats': No such file or directory and match the input of compositionVis.R taxonomy=read.delim("GTDBtk.stats",header=TRUE) %>%

The error message shows:

During startup - Warning message: Setting LC_CTYPE failed, using "C" Warning message: package 'ggplot2' was built under R version 4.0.5

Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

filter, lag

The following objects are masked from 'package:base':

intersect, setdiff, setequal, union

Warning message: package 'dplyr' was built under R version 4.0.5 Error in separate(., classification, into = c("kingdom", "phylum", "class", : could not find function "separate" Calls: %>% Execution halted [Wed Jul 28 16:41:41 2021] Error in rule compositionVis: ...

And here is the first two lines of the file GTDBtk.stats (originated from GTDBTk.stats) if it can help:

user_genome     classification  fastani_reference       fastani_reference_radius        fastani_taxonomy        fastani_ani     fastani_af      closest_placement_reference     closest_placement_radius        closest_placement_taxonomy closest_placement_ani   closest_placement_af    pplacer_taxonomy        classification_method   note    other_related_references(genome_id,species_name,radius,ANI,AF)  msa_percent     translation_table  red_value       warnings
bin.1.orig      d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Coprococcus;s__Coprococcus eutactus_A GCF_001404675.1 95.0    d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Coprococcus;s__Coprococcus eutactus_A    98.92   0.97    GCF_001404675.1 95.0    d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Coprococcus;s__Coprococcus eutactus_A    98.92   0.97    d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Coprococcus;s__       taxonomic classification defined by topology and ANI    topological placement and ANI have congruent species assignments   GCA_900767685.1, s__Coprococcus sp900767685, 95.0, 89.43, 0.61; GCF_000154425.1, s__Coprococcus eutactus, 95.0, 89.18, 0.83; GCA_900557435.1, s__Coprococcus sp900557435, 95.0, 88.75, 0.64; GCA_900548215.1, s__Coprococcus sp900548215, 95.0, 88.31, 0.68; GCA_900548315.1, s__Coprococcus sp900548315, 95.0, 88.14, 0.76; GCF_003482105.1, s__Coprococcus sp000433075, 95.0, 82.19, 0.27; GCF_003461625.1, s__Coprococcus sp900066115, 95.0, 80.55, 0.22; GCA_002437435.1, s__Coprococcus sp002437435, 95.0, 80.05, 0.2; GCF_000154245.1, s__Coprococcus sp000154245, 95.0, 80.01, 0.27; GCA_900761435.1, s__Coprococcus sp900761435, 95.0, 77.48, 0.1        47.47   11      N/A     N/A
franciscozorrilla commented 3 years ago

Glad the GTDBTk jobs ran succesfully and the results are now present.

Thank you for highlighting this additional bug, indeed the script should be loading the file GTDBTk.stats instead of GTDBtk.stats, this is now fixed in the latest commit (https://github.com/franciscozorrilla/metaGEM/commit/35eff4cb04b309068898e4c9ac91bd5dade54de4).

Regarding your last error, it appears the separate()function is in the tidyr package, which is part of the tidyverse package. I believe your problems should be solved by installing either:

 conda install -c r r-tidyr 

or

 conda install -c r r-tidyverse

I will update the metaGEM recipe file to include either the tidyverse or tidyr packages.

zpf0117b commented 3 years ago

Hi, @franciscozorrilla , it appears the file compositionVis.R needs another package tidytext (see the discussion in https://twitter.com/juliasilge/status/1077606510551683072 and the documentation in https://cran.r-project.org/web/packages/tidytext/tidytext.pdf), which enables the function scale_x_reordered(). We can install this package by installing:

conda install -c r r-tidytext

Here, we finish the compositionVis task successfully.

franciscozorrilla commented 3 years ago

I have now replaced dplyr and ggplot2 with tidyverse + added tidytext in both the metagem_env.yml conda recipe file and the compositionVis.R script.

Thanks a lot for reporting these bugs! If everything is working smoothly now I will close the issue, feel free to reopen if anything else comes up.

kunaljaani commented 2 years ago

Hi Francisco,

I have a question regarding the composition-- In my case, I have >90% of MAGs which are not assigned to the rank of genus/species. Therefore, I am unable to generate the plots using compositionVis.R

When tried to make the classical abundance table by combining abundance.stat and GTDBTk.stat; the GTDBTk.stat is missing the sample id of the MAG (I have attached the text files for your reference). Could you please suggest a fix to generate a taxonomy+abundance table?

Thank you. Kunal GTDBTk_stats.txt abundance_stats.txt

franciscozorrilla commented 2 years ago

Hi Kunal,

You can simply modify the R script to show e.g. class or genus level taxonomic assignments instead by replacing the species term in the following command.

https://github.com/franciscozorrilla/metaGEM/blob/cdaeb25e5177751b1df21d078c204377054e51c4/scripts/compositionVis.R#L16-L17

You should also remove the filter step at the start, e.g. for class level taxonomic assignments, modify the above to lines as follows:

 ggplot(taxab) + 
   geom_bar(aes(x=reorder_within(class,-rel_ab,sample),y=rel_ab*100),stat="identity") +  

The plot currently fails to visualize abundances because the abundance + taxonomy files cannot be merged. For some reason, the two files that you have provided seem to correspond to different sets of MAGs? For example, the abundance table has 17 MAGs, while the taxonomy table has 18 MAGs. One has bin IDs bin.40.orig, while the other has IDs M.bin.1.o, not really sure how you managed to do that, maybe the files from different analyeses got mixed up? To fix this issue, please use the correct abundance and taxonomy files.

Best wishes, Francisco

kunaljaani commented 2 years ago

Hi!

Thank you for your reply. Actually, the files are from the same run (I had provided a few lines from each file as representative). As you mentioned, in one of the files (abundance.stats) the IDs start with the sample name "M", however the corresponding GTDBTk.stats file the bin IDs don't have the sample IDs. Do you think I have messed up somewhere?

Thanks a lot. Kunal

GTDBTk.stats.txt abundance.stats.txt

franciscozorrilla commented 2 years ago

Yes, something has gone wrong. Could you provide more details regarding the steps that you followed?

If you followed the general workflow outlined in the tutorial, the sample IDs should be encoded in the MAG filename. Specifically, in section 5 you can see how to run the extractDnaBins rule. For exmaple, let's say your sample IDs are ERR260137, then your bin IDs should automatically be named e.g. ERR260137_bin.1.o.fa.

Here you can see the underlying code, essentially it just copies and renames the bins from the metawrap bin reassembly output.

https://github.com/franciscozorrilla/metaGEM/blob/cdaeb25e5177751b1df21d078c204377054e51c4/Snakefile#L1741-L1768

kunaljaani commented 2 years ago

Yes, I have followed the steps described in the tutorial with -l, because I am running it on a cloud instance. All my files are perfectly labeled with the sample initials e.g. "M", and also segregated into the respective folders (as could be seen from the screenShot- GEMs and protein bin folder).

fileName

franciscozorrilla commented 2 years ago

I see, those are your sample IDs. Could you also check the contents of your dna_bins folder? I would suggest trying to re-generate the abundance.stats file as well as the GTDBTk.stats file.

Please run the compositionVis rule again to re-generate the files, or alternatively run the code to manually re-generate each file:

https://github.com/franciscozorrilla/metaGEM/blob/cdaeb25e5177751b1df21d078c204377054e51c4/Snakefile#L1398-L1436

The MAG IDs should follow the pattern {sampleID}{bin_ID}.fa

kunaljaani commented 2 years ago

Ya, even the IDs of the dna.bins looks fine. Thanks for your inputs, I will rerun the compositionVis.

Thanks a lot. Kunal

dna_bins