bioinfo-ut / PlasmidSeeker

A k-mer based program for the identification of known plasmids from whole-genome sequencing reads
BSD 3-Clause "New" or "Revised" License
35 stars 11 forks source link

Plasmid names file .../PlasmidSeeker-master/PlasmidSeeker_DB/db_w20_fna/names.txt missing! at plasmidseeker.pl line 264. #24

Open ilkaybuysal opened 3 years ago

ilkaybuysal commented 3 years ago

dear developers,

The error Im getting mentions a names.txt file, should I create that manually myself? Could you elaborate a bit more please on this?

Thanks in advance.

The exact cmd I'm running is: perl plasmidseeker.pl -d .../SOFTWARE/PlasmidSeeker_ALL/PlasmidSeeker-master/PlasmidSeeker_DB/db_w20_fna -i .../Documents/PROJECTS/2021_03_08_Plasmid_software_testing/analyses/results_running_w_software/only_bacteria_genomes_20X/plasmidseeker/input_for_plasmidseeker/all_bacteria_R1_R2_merged.fq -b .../Documents/PROJECTS/2021_03_08_Plasmid_software_testing/analyses/assembly_results/all_bacteria_20X/contigs.fasta -o .../Documents/PROJECTS/2021_03_08_Plasmid_software_testing/analyses/results_running_w_software/only_bacteria_genomes_20X/plasmidseeker/output_plasmidseeker/ -k --verbose

Loading database... Plasmid names file .../SOFTWARE/PlasmidSeeker_ALL/PlasmidSeeker-master/PlasmidSeeker_DB/db_w20_fna/names.txt missing! at plasmidseeker.pl line 264.

mihkelvaher commented 3 years ago

It's a bit odd, that the names.txt file is missing. The database directory should contain in addition to that file also *.list files. names.txt maps the *.list files to the outputted name. names.txt head:

# Database: ./all_plasmid_ps_db/    Plasmids total: 19782   Built on: Sun Nov 15 11:00:01 2020  K-mer length: 32
#
plasmid_1.fna   >NC_004464.2 Citrobacter freundii plasmid pCTX-M3, complete sequence    89468
plasmid_10.fna  >NC_002013.1 Staphylococcus aureus plasmid pC194, complete sequence 2910
plasmid_100.fna >NC_004758.1 Neisseria meningitidis plasmid pJS-B, complete sequence    7245

Note that the list files in the database dir are named as plasmid_X.fna_K.list where K is the k-mer length (32 in the example). The second column contains the outputted names, while the third is the length of the plasmid (in .fna). 2nd and 3rd columns are only used in the output if I remember correctly so they can be anything if there's a need to change something.

ilkaybuysal commented 3 years ago

so the names.txt file should be in the same directory when I download one of these files here: https://bioinfo.ut.ee/plasmidseeker/?

mihkelvaher commented 3 years ago

There are links to an older and a newer database listed in the README.md. Not all files listed in https://bioinfo.ut.ee/plasmidseeker/ are databases, some only contain fasta seqs used to create the database which seems to be your case (...PlasmidSeeker_DB/db_w20 _fna /names.txt missing!...).

ilkaybuysal commented 3 years ago

Thank you for the quick response.

I'm assuming Ver 2. (Nov 2020) with 19,782 plasmids: plasmidseeker_db_w20_Nov-2021.tar.gz is an already built database so doesn't need to be built again, does it?

mihkelvaher commented 3 years ago

Yes.