CCB-SB / plsdb

PLSDB pipeline to collect bacterial plasmids from NCBI
https://ccb-microbe.cs.uni-saarland.de/plsdb/
35 stars 4 forks source link

PLSDB need to have a min size threshold to include new plasmids. #8

Closed jnesme closed 2 years ago

jnesme commented 3 years ago

Dear devs,

PLSDB is becoming a reference for curated set of complete plasmid sequences and in my view should be more conservative than exhautive. Some very small plasmid, with inconsistent naming in NCBI RefSeq, made their way to PLSDB. Such as most sequences from this study, despite the sequence title says "complet" it's just a 200bp fragment cloned in DH5alpha https://www.sciencedirect.com/science/article/pii/S0378113519311782

Smallest, experimentally verified size of plasmid is currently 746bp in free-living bacteria (https://academic.oup.com/femsec/article/92/4/fiw043/2197854)

And as low as 744bp for a very degenrate symbiont genome: (verified by Southern-Blot) https://www.sciencedirect.com/science/article/pii/S0092867413006466

I suggest very much to set a lower sequence size threshold for inclusion around these values.

Best regards, Joseph

VGalata commented 3 years ago

Dear @jnesme,

Thank you very much for your feedback and the suggestion to filter out very short sequences! I agree - having a minimal length cutoff would be desiderable and makes sense.

@Xethic @SmalJonni Could you consider to include that into the next database update?

Xethic commented 3 years ago

Dear @jnesme

I can only join Valentinas comment and thank you. We confirm to have your request on our feature list for the next major release scheduled for this Summer.

Best, Fabian

SmalJonni commented 2 years ago

A min size threshold was added and is now included in filtering rule 4.