UPHL-BioNGS / Grandeur

UPHL's Reference Free Pipeline
GNU General Public License v3.0
23 stars 7 forks source link

Setting outgroup taxon to fastani reference #132

Closed DrB-S closed 9 months ago

DrB-S commented 10 months ago

When adding a fastani reference to the command-line using "-resume --fastani_ref ncbi_dataset/data/GCF_000195955.2/GCF_000195955.2_ASM19595v2_genomic.fna", do I need to use that entire name (GCF_000195955.2_ASM19595v2_genomic) for the outgroup taxon as well for iqtree2?

DrB-S commented 10 months ago

Pipeline errored out at fastani with the following command-line:
nextflow run UPHL-BioNGS/Grandeur -profile singularity,msa --medcpus 100 --maxcpus 120 --reads reads --outgroup GCF_005156105.1 --fastani_ref ncbi_dataset/data/GCF_005156105.1/GCF_005156105.1_ASM515610v1_genomic.fna --current_datasets true -resume. I have attached the nextflow log. nextflow_5Oct23.log

The pipeline ran when using only "--current_datasets true", but not when I specified "--fastani_ref", with or without "--current_datasets true".

erinyoung commented 10 months ago

When adding a fastani reference to the command-line using "-resume --fastani_ref ncbi_dataset/data/GCF_000195955.2/GCF_000195955.2_ASM19595v2_genomic.fna", do I need to use that entire name (GCF_000195955.2_ASM19595v2_genomic) for the outgroup taxon as well for iqtree2?

You will need to use the name that iqtree is expecting, which is generally the basename of the file according to iqtree2. It is likely 'GCF_000195955.2_ASM19595v2_genomic' in your instance, although the iqtree2 error that was printed to your screen is probably more useful.

erinyoung commented 10 months ago

Pipeline errored out at fastani with the following command-line: nextflow run UPHL-BioNGS/Grandeur -profile singularity,msa --medcpus 100 --maxcpus 120 --reads reads --outgroup GCF_005156105.1 --fastani_ref ncbi_dataset/data/GCF_005156105.1/GCF_005156105.1_ASM515610v1_genomic.fna --current_datasets true -resume. I have attached the nextflow log. nextflow_5Oct23.log

The pipeline ran when using only "--current_datasets true", but not when I specified "--fastani_ref", with or without "--current_datasets true".

The nextflow log is useful, but generally doesn't specify the error that was encountered. Can you share with me the nextflow error message? It's generally printed to the screen and is very long.

DrB-S commented 10 months ago

Unfortunately, I don't have that if it isn't in .nextflow.log.

erinyoung commented 10 months ago

I have a new version of Grandeur going through testing (https://github.com/UPHL-BioNGS/Grandeur/pull/134), this isn't going to fix the issue when fastani does not have any top hits and phylogenetic analysis is attempted.

This isn't helpful to you, but I just wanted to make sure you didn't get your hopes up. I have some workarounds here: https://github.com/UPHL-BioNGS/Grandeur/issues/130#issuecomment-1756422961

erinyoung commented 10 months ago

What is in your work subdirectory /data/Sequence_analysis/Grandeur/Analyses/TB/Mbovis_4Oct23/work/ff/0786df86b1e84881bb4050cf539925?

edit : copied and pasted weird. Path should be correct now.

DrB-S commented 10 months ago

That work dir contains the following Mycobacterium genomes:

Mycobacterium genomes in /work/ff/0786df86b1e84881bb4050cf539925:
-rw-rw-r-- 1 becksts becksts 1332944 Oct  5 13:55 /data/Sequence_analysis/Grandeur/Analyses/TB/Mbovis_4Oct23/work/ff/0786df86b1e84881bb4050cf539925/Mycobacterium_tuberculosis_GCF_000195955.2_ds.fna.gz
-rw-rw-r-- 1 becksts becksts 1293757 Oct  5 13:55 /data/Sequence_analysis/Grandeur/Analyses/TB/Mbovis_4Oct23/work/ff/0786df86b1e84881bb4050cf539925/Mycobacterium_tuberculosis_GCF_000633085.1_ds.fna.gz
-rw-rw-r-- 1 becksts becksts 1334568 Oct  5 13:55 /data/Sequence_analysis/Grandeur/Analyses/TB/Mbovis_4Oct23/work/ff/0786df86b1e84881bb4050cf539925/Mycobacterium_tuberculosis_GCF_000666025.1_ds.fna.gz
-rw-rw-r-- 1 becksts becksts 1336017 Oct  5 13:55 /data/Sequence_analysis/Grandeur/Analyses/TB/Mbovis_4Oct23/work/ff/0786df86b1e84881bb4050cf539925/Mycobacterium_tuberculosis_GCF_000666045.1_ds.fna.gz
-rw-rw-r-- 1 becksts becksts 1324667 Oct  5 13:55 /data/Sequence_analysis/Grandeur/Analyses/TB/Mbovis_4Oct23/work/ff/0786df86b1e84881bb4050cf539925/Mycobacterium_tuberculosis_GCF_000666065.1_ds.fna.gz
-rw-rw-r-- 1 becksts becksts 1331181 Oct  5 13:55 /data/Sequence_analysis/Grandeur/Analyses/TB/Mbovis_4Oct23/work/ff/0786df86b1e84881bb4050cf539925/Mycobacterium_tuberculosis_GCF_000666085.1_ds.fna.gz
-rw-rw-r-- 1 becksts becksts 1334237 Oct  5 13:55 /data/Sequence_analysis/Grandeur/Analyses/TB/Mbovis_4Oct23/work/ff/0786df86b1e84881bb4050cf539925/Mycobacterium_tuberculosis_GCF_000666105.1_ds.fna.gz
-rw-rw-r-- 1 becksts becksts 1325032 Oct  5 13:55 /data/Sequence_analysis/Grandeur/Analyses/TB/Mbovis_4Oct23/work/ff/0786df86b1e84881bb4050cf539925/Mycobacterium_tuberculosis_GCF_000666125.1_ds.fna.gz
-rw-rw-r-- 1 becksts becksts 1316891 Oct  5 13:55 /data/Sequence_analysis/Grandeur/Analyses/TB/Mbovis_4Oct23/work/ff/0786df86b1e84881bb4050cf539925/Mycobacterium_tuberculosis_GCF_000729745.1_ds.fna.gz
-rw-rw-r-- 1 becksts becksts 1316730 Oct  5 13:55 /data/Sequence_analysis/Grandeur/Analyses/TB/Mbovis_4Oct23/work/ff/0786df86b1e84881bb4050cf539925/Mycobacterium_tuberculosis_GCF_000729755.1_ds.fna.gz
-rw-rw-r-- 1 becksts becksts 1311795 Oct  5 13:55 /data/Sequence_analysis/Grandeur/Analyses/TB/Mbovis_4Oct23/work/ff/0786df86b1e84881bb4050cf539925/Mycobacterium_tuberculosis_GCF_000729765.1_ds.fna.gz
-rw-rw-r-- 1 becksts becksts 1317832 Oct  5 13:55 /data/Sequence_analysis/Grandeur/Analyses/TB/Mbovis_4Oct23/work/ff/0786df86b1e84881bb4050cf539925/Mycobacterium_tuberculosis_GCF_000749605.1_ds.fna.gz
-rw-rw-r-- 1 becksts becksts 1313063 Oct  5 13:55 /data/Sequence_analysis/Grandeur/Analyses/TB/Mbovis_4Oct23/work/ff/0786df86b1e84881bb4050cf539925/Mycobacterium_tuberculosis_GCF_000749615.1_ds.fna.gz
-rw-rw-r-- 1 becksts becksts 1316924 Oct  5 13:55 /data/Sequence_analysis/Grandeur/Analyses/TB/Mbovis_4Oct23/work/ff/0786df86b1e84881bb4050cf539925/Mycobacterium_tuberculosis_GCF_001593225.1_ds.fna.gz
-rw-rw-r-- 1 becksts becksts 1330322 Oct  5 13:55 /data/Sequence_analysis/Grandeur/Analyses/TB/Mbovis_4Oct23/work/ff/0786df86b1e84881bb4050cf539925/Mycobacterium_tuberculosis_GCF_002982335.1_ds.fna.gz
-rw-rw-r-- 1 becksts becksts 1330589 Oct  5 13:55 /data/Sequence_analysis/Grandeur/Analyses/TB/Mbovis_4Oct23/work/ff/0786df86b1e84881bb4050cf539925/Mycobacterium_tuberculosis_GCF_014489235.1_ds.fna.gz

Subdirs:

fastani 2023TB-0113.txt 2023TB-0113.txt.matrix 2023TB-0113_fastani.csv

logs average_nucleotide_identity:fastani 2023TB-0113.98052605-423d-4682-a72f-ce8368545c23.log

The following work file looks more interesting to me: The following work file looks more interesting to me: 31/9d0ddad0985b3a56af0ba62d76303d/logs/information\:size/2023TB-0122.98052605-423d-4682-a72f-ce8368545c23.log (See attached below, and especially the 5th line from the end) 2023TB-0122.98052605-423d-4682-a72f-ce8368545c23.log

erinyoung commented 10 months ago

The log file you shared was for the 'size' process, do you have the one for fastani?

I wouldn't worry about

The expected genome size based on Mycobacterium and tuberculosis was not found

because that means that grandeur didn't find a TB genome size in genome_sizes.json, which is expected. I can add TB and a few other mycobacterium in the next update, but it's not going to fix your issue.

erinyoung commented 10 months ago

The pipeline ran when using only "--current_datasets true", but not when I specified "--fastani_ref", with or without "--current_datasets true".

I think I've fixed this issue with version 3.5.20231010, or, more precisely, I've fixed the documentation. I still recommend the newest version, though.

When params.msa = true , the input fasta files for fastani need to start with genus_species_<else>.fasta (or genus_species_<else>.fasta.gz if compressed).

Hopefully the documentation is more clear this time. Wiki page : https://github.com/UPHL-BioNGS/Grandeur/wiki/fastani

My apologies that this has been so painful.

erinyoung commented 10 months ago

I think https://github.com/UPHL-BioNGS/Grandeur/pull/138 is going to fix your issue.

Let me know if you run into issues!!!