ParBLiSS / FastANI

Fast Whole-Genome Similarity (ANI) Estimation
Apache License 2.0
374 stars 67 forks source link

issues with reference list file #99

Open rosave9 opened 2 years ago

rosave9 commented 2 years ago

Hi I'm having issues running a reference list I downloaded from: enve-omics.ce.gatech.edu/data/fastani (I downloaded the D1 dataset).

I open it and turn it into a .txt file and I see this when I open it in text edit:

gi|918757418|ref|NZ_CP011489.1| Actinobacteria bacterium IMCC26256, complete genome AACGTGGGGCAATATGAGTTCTCCACAGAGCGCATCAAGGGCCCCTACCATGTTTTCAGGCTGGGGACAA CCTCGCAAACTCGCTTAGTTAGCGGCTCTCCGAGGTTATCCACCGTTCGCTATTCGGGCGCTAGTTTGCT (......) next genome NUCLEOTIDE-SEQUENCE (...) etc

Then I opened it in command and I see :

Acetobacter_ghanensis.LargeContigs.fna Acetobacterium_woodii_DSM_1030.LargeContigs.fna Acetobacter_pasteurianus_IFO_3283_01.LargeContigs.fna Acetobacter_senegalensis.LargeContigs.fna etc

So I tried a few things 1) I converted all .fna files into 1 single .txt file using cat *.fna >database.txt and tried running that .txt file as my --rf but that failed

2) I copied all the following genome names (see below) into 1 txt file and used that as my --rf but still failed Acetobacter_ghanensis.LargeContigs.fna Acetobacterium_woodii_DSM_1030.LargeContigs.fna Acetobacter_pasteurianus_IFO_3283_01.LargeContigs.fna Acetobacter_senegalensis.LargeContigs.fna and more...

3) To the file in #2 I included the path file so /nfs/turbo/lsi-NPDC/D1-folder/Acetobacter.. etc for each file and use that overall .txt file as my --rf and failed again

4) Tried using the D1.tar.gz original file as my --rf but that also failed

Could I see an example of a --rf file or could I get guidance on how to use a NCBI prokaryote ref genomes database downloaded into my own pipeline? Thanks