Open arunbodd opened 1 month ago
My apologies for my late reply!
Generally we use Grandeur with fasta files for two things:
I don't have this built in to Grandeur (it's a long story, but a lot of sites are blocked locally - such as the ENA)
For phylogenetic analysis, this is what we use for testing with github actions (I'm making the assumption you're curious about the phylogenetic analysis):
mkdir fastas
cd fastas
wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/013/783/245/GCA_013783245.1_ASM1378324v1/GCA_013783245.1_ASM1378324v1_genomic.fna.gz && gzip -d GCA_013783245.1_ASM1378324v1_genomic.fna.gz
wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/026/626/185/GCA_026626185.1_ASM2662618v1/GCA_026626185.1_ASM2662618v1_genomic.fna.gz && gzip -d GCA_026626185.1_ASM2662618v1_genomic.fna.gz
wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/020/808/985/GCA_020808985.1_ASM2080898v1/GCA_020808985.1_ASM2080898v1_genomic.fna.gz && gzip -d GCA_020808985.1_ASM2080898v1_genomic.fna.gz
wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/904/863/225/GCA_904863225.1_KSB1_6J/GCA_904863225.1_KSB1_6J_genomic.fna.gz && gzip -d GCA_904863225.1_KSB1_6J_genomic.fna.gz
cd ../
nextflow run . -profile docker,msa --fastas fastas
OR
Instead of pointing the workflow to a directory, a list of fasta files can be used instead. This must be the option used if using cloud resources.
ls fastas/* > fastas.txt
fastas.txt should have file contents like so
fastas/GCA_013783245.1_ASM1378324v1_genomic.fna
fastas/GCA_026626185.1_ASM2662618v1_genomic.fna
fastas/GCA_020808985.1_ASM2080898v1_genomic.fna
fastas/GCA_904863225.1_KSB1_6J_genomic.fna
nextflow run . -profile docker,msa --fasta_list fastas.txt
This gives a summary file with 1-2 key results from each analysis.
sample file version per_core_genome_genes warnings amrfinder_genes_(per_cov/per_ident) predicted_organism mlst_matching_pubmlst_scheme mlst_st fastani_top_organism fastani_top_reference fastani_top_ani_estimate fastani_top_total_query_sequence_fragments fastani_top_fragments_aligned_as_orthologous_matches mash_reference mash_mash-distance mash_p-value mash_matching-hashes mash_organism plasmidfinder_plasmid_(identity) kleborate_virulence_score kleborate_resistance_score
GCA_013783245.1_ASM1378324v1_genomic GCA_013783245.1_ASM1378324v1_genomic.fna 4.5.24184 84.42 Multiple FastANI hits,Low core genes, ['arsA (100.00/98.63)', 'arsB (100.00/98.83)', 'arsC (100.00/99.29)', 'arsD (100.00/90.83)', 'arsR (100.00/98.28)', 'blaSHV-11 (100.00/100.00)', 'emrD (100.00/99.49)', 'fieF (100.00/100.00)', 'fosA (100.00/99.28)', 'oqxA (100.00/100.00)', 'oqxB (100.00/100.00)', 'pcoA (100.00/100.00)', 'pcoB (100.00/100.00)', 'pcoC (100.00/100.00)', 'pcoD (100.00/99.68)', 'pcoE (100.00/97.92)', 'pcoR (100.00/100.00)', 'pcoS (100.00/99.57)', 'pmrB_R256G (100.00/99.45)', 'silA (100.00/98.85)', 'silB (100.00/97.91)', 'silC (100.00/99.35)', 'silE (100.00/92.31)', 'silF (100.00/99.15)', 'silP (99.64/94.53)', 'silR (100.00/98.23)', 'silS (100.00/98.78)'] Klebsiella_pneumoniae klebsiella 37 Klebsiella_pneumoniae Klebsiella_pneumoniae_GCF_000240185.1.fna.gz 99.124 1653 1791 refseq-NZ-1328379-PRJNA224116-SAMN02138587-GCF_000567645.1-.-Klebsiella_pneumoniae_MGH_47.fna 0.00305472 0 883/1000 Klebsiella_pneumoniae ['Col440I (92.73)', 'IncFIB(K) (98.93)', 'IncFII(K) (100.0)'] 0 0
GCA_020808985.1_ASM2080898v1_genomic GCA_020808985.1_ASM2080898v1_genomic.fna 4.5.24184 84.62 Multiple FastANI hits,Low core genes, ['blaSHV-11 (100.00/100.00)', 'emrD (100.00/99.49)', 'fieF (100.00/100.00)', 'fosA (100.00/100.00)', 'fosA7 (65.71/91.30)', 'oqxA (100.00/100.00)', 'oqxB (100.00/99.81)'] Klebsiella_pneumoniae klebsiella 1017 Klebsiella_pneumoniae Klebsiella_pneumoniae_GCF_022869665.1.fna.gz 99.0812 1579 1779 refseq-NZ-1438805-PRJNA224116-SAMN02581266-NZ_JJNJ-.-Klebsiella_pneumoniae_UCI_60.fna 0.00823165 0 726/1000 Klebsiella_pneumoniae ['IncFIB(pKPHS1) (99.46)'] 0 0
GCA_026626185.1_ASM2662618v1_genomic GCA_026626185.1_ASM2662618v1_genomic.fna 4.5.24184 82.57 Multiple FastANI hits,Low core genes, "['aac(3)-IVa (100.00/100.00)', 'aadA1 (100.00/100.00)', 'aadA2 (100.00/100.00)', 'aadA2 (100.00/100.00)', ""aph(3'')-Ib (100.00/100.00)"", ""aph(3'')-Ib (100.00/100.00)"", ""aph(3'')-Ib (100.00/99.63)"", ""aph(3')-IIa (100.00/100.00)"", ""aph(3')-Ia (100.00/100.00)"", 'aph(4)-Ia (100.00/100.00)', 'aph(6)-Id (100.00/100.00)', 'aph(6)-Id (100.00/100.00)', 'aph(6)-Id (100.00/100.00)', 'armA (100.00/100.00)', 'blaCTX-M-14 (100.00/100.00)', 'blaDHA-1 (100.00/100.00)', 'blaSHV-25 (100.00/100.00)', 'blaTEM-1 (100.00/100.00)', 'ble (61.11/96.20)', 'cmlA1 (100.00/100.00)', 'dfrA12 (100.00/100.00)', 'emrD (100.00/99.49)', 'fieF (100.00/100.00)', 'floR (100.00/99.75)', 'fosA (99.28/100.00)', 'fosA3 (100.00/100.00)', 'gyrA_S83I (100.00/99.77)', 'mph(A) (100.00/100.00)', 'mph(E) (100.00/100.00)', 'msr(E) (100.00/100.00)', 'oqxA (100.00/100.00)', 'oqxB (100.00/100.00)', 'parC_S80I (98.84/99.41)', 'qacE (82.61/95.79)', 'qacEdelta1 (100.00/100.00)', 'qacL (100.00/100.00)', 'qnrB4 (100.00/100.00)', 'qnrS1 (100.00/100.00)', 'rmtB1 (100.00/100.00)', 'sul1 (100.00/100.00)', 'sul1 (100.00/100.00)', 'sul2 (100.00/100.00)', 'sul3 (100.00/100.00)', 'terB (100.00/100.00)', 'terC (100.00/99.13)', 'terD (100.00/98.96)', 'terE (100.00/99.48)', 'tet(A) (100.00/99.75)', 'tmexC (100.00/99.74)', 'tmexD (100.00/99.90)', 'toprJ1 (100.00/100.00)']" Klebsiella_pneumoniae klebsiella 789 Klebsiella_pneumoniae Klebsiella_pneumoniae_GCF_000240185.1.fna.gz 99.1677 1665 1861 refseq-NZ-573-PRJNA224116-SAMN02777842-GCF_000739495.1-.-Klebsiella_pneumoniae.fna 0.0083078 0 724/1000 Klebsiella_pneumoniae ['Col(pHAD28) (91.6)', 'Col440I (91.23)', 'IncFIB(pNDM-Mar) (99.32)', 'IncHI1B(pNDM-MAR) (100.0)', 'IncR (100.0)', 'IncX1 (98.4)'] 0 1
GCA_904863225.1_KSB1_6J_genomic GCA_904863225.1_KSB1_6J_genomic.fna 4.5.24184 83.44 Multiple FastANI hits,Low core genes, "[""aac(6')-Ib-cr5 (100.00/100.00)"", ""aph(3'')-Ib (100.00/100.00)"", 'aph(6)-Id (100.00/100.00)', 'arsA (100.00/100.00)', 'arsB (100.00/100.00)', 'arsC (100.00/100.00)', 'arsD (100.00/91.67)', 'arsR (100.00/100.00)', 'blaCTX-M-15 (100.00/100.00)', 'blaOXA-1 (100.00/100.00)', 'blaSHV-1 (100.00/100.00)', 'blaTEM-1 (100.00/100.00)', 'catB3 (70.00/100.00)', 'clpK (97.89/99.25)', 'crcB (100.00/100.00)', 'dfrA14 (100.00/100.00)', 'emrD (100.00/99.49)', 'fieF (100.00/100.00)', 'fosA (100.00/99.28)', 'fosA7 (100.00/91.43)', 'hsp20 (100.00/100.00)', 'oqxA (100.00/100.00)', 'oqxB19 (100.00/100.00)', 'pcoA (100.00/100.00)', 'pcoB (100.00/100.00)', 'pcoC (100.00/100.00)', 'pcoD (100.00/99.68)', 'pcoE (100.00/94.44)', 'pcoR (100.00/100.00)', 'pcoS (100.00/99.14)', 'qnrB1 (100.00/100.00)', 'silA (100.00/98.85)', 'silB (100.00/97.91)', 'silC (100.00/100.00)', 'silE (100.00/91.61)', 'silF (100.00/99.15)', 'silP (99.64/94.18)', 'silR (100.00/100.00)', 'silS (100.00/100.00)', 'sul2 (100.00/100.00)', 'tet(A) (100.00/100.00)']" Klebsiella_pneumoniae klebsiella 323 Klebsiella_pneumoniae Klebsiella_pneumoniae_GCF_000240185.1.fna.gz 99.0536 1616 1825 refseq-NZ-573-PRJNA224116-SAMEA2602936-NZ_CCGN-.-Klebsiella_pneumoniae.fna 0.000434439 0 982/1000 Klebsiella_pneumoniae ['Col(pHAD28) (100.0)', 'IncFIB(K) (98.93)', 'IncFII(K) (95.95)'] 0 1
There is also a newick file generated with iqtree2:
(GCA_020808985.1_ASM2080898v1_genomic:0.0038882302,(((GCA_013783245.1_ASM1378324v1_genomic:0.0035602662,GCA_026626185.1_ASM2662618v1_genomic:0.0030635049)67.8/75:0.0004092349,Klebsiella_pneumoniae_GCF_000240185.1:0.0032600322)100/100:0.0009262356,Klebsiella_pneumoniae_GCF_022869665.1:0.0046128026)99.6/99:0.0005900806,GCA_904863225.1_KSB1_6J_genomic:0.0039588357);
A SNP matrix generated via SNP dists:
snp-dists 0.8.2,GCA_020808985.1_ASM2080898v1_genomic,GCA_013783245.1_ASM1378324v1_genomic,Klebsiella_pneumoniae_GCF_022869665.1,GCA_026626185.1_ASM2662618v1_genomic,Klebsiella_pneumoniae_GCF_000240185.1,GCA_904863225.1_KSB1_6J_genomic
GCA_020808985.1_ASM2080898v1_genomic,0,26202,26340,24554,25128,24777
GCA_013783245.1_ASM1378324v1_genomic,26202,0,26648,21896,22221,26669
Klebsiella_pneumoniae_GCF_022869665.1,26340,26648,0,26209,26246,26393
GCA_026626[18](https://github.com/UPHL-BioNGS/Grandeur/actions/runs/9766935718/job/26961027547#step:5:19)5.1_ASM2662618v1_genomic,24554,21896,26209,0,20967,25626
Klebsiella_pneumoniae_GCF_000240185.1,25128,22221,26246,20967,0,25609
GCA_904863225.1_KSB1_6J_genomic,24777,26669,26393,25626,25609,0
And more.
More information can be found on our wiki pages https://github.com/UPHL-BioNGS/Grandeur/wiki/Phylogenetic-Analysis, https://github.com/UPHL-BioNGS/Grandeur/wiki/USAGE#fasta-files, and https://github.com/UPHL-BioNGS/Grandeur/wiki/phylogenetic_analysis.
Did this work for you?
Hello Developer,
Can you please provide at least a test.config with test fasta files to run this pipeline and understand the output ?
Thank you.