CCB-SB / plsdb

PLSDB pipeline to collect bacterial plasmids from NCBI
https://ccb-microbe.cs.uni-saarland.de/plsdb/
35 stars 4 forks source link

Provide plasmid sequences for download #9

Closed oschwengers closed 3 years ago

oschwengers commented 3 years ago

Thanks for providing & maintaining this wonderful plasmid resource! The zip file provided on the download section at https://ccb-microbe.cs.uni-saarland.de/plsdb/plasmids/download/ contains the blast database but not the actual sequences themselves.

I'm very interested in these for several analysis and for sure other might also have an interest in them. Would it be possible to provide these as a multi record fasta file (plsdb.fna) within the zip file? Thansk again and best regards!

VGalata commented 3 years ago

Dear @oschwengers,

Thank you very much for your feedback!

The plasmid sequences can be extracted from the provided BLAST databases - see issue #7. I understand that it would be more convenient for the user to have the FASTA file inside the archive but we would like to avoid to store redundant information.

Let me know if you have any issues extracting the sequences.

Best regards!

oschwengers commented 3 years ago

@VGalata thanks for your quick reply and the hint - it worked absolutely fine.

However, it appears that several users ask for the native plasmid sequences. Hence, wouldn't it be a favorable option to provide the sequences themselves instead of a ready-to-use blast db? Then you wouldn't store redundant information and every user could use them the way which deems best. More a suggestions - it works anyways. Maybe a short note would help to avoid repeating requests ;-)

Again, thank you very much and best regards!

VGalata commented 3 years ago

@oschwengers I completely agree with the rationale behind your suggestion. The only reason why I would prefer to keep the BLAST database is to provide the user the database/index files which are used by the web server - to facilitate the replication of the online search results.

Actually, there is a note with the heading "FASTA with plasmid sequences" in the README.md file inside the archive. But, I know that it is not always easy to find the relevant information in a long documentation file. :) It is good that this question is then also covered by the issues here in the repository.