CCB-SB / plsdb

PLSDB pipeline to collect bacterial plasmids from NCBI
https://ccb-microbe.cs.uni-saarland.de/plsdb/
35 stars 4 forks source link

duplicate records #5

Closed haruosuz closed 3 years ago

haruosuz commented 3 years ago

In PLSDB (version 2020_06_29), of the 23227 records, there were 44 duplicate plasmids (i.e., one from INSDC and one from RefSeq; e.g., CP026582.3 and NZ_CP026582.2). I was wondering whether there is any reason to retain both of the duplicate plasmids?

Rtable.plsdb.txt

VGalata commented 3 years ago

Dear @haruosuz,

Thank you for reporting this! Apparently, these sequences slipped through our filtering steps. We will investigate this issue and update the code to avoid such cases in the future.

Xethic commented 3 years ago

Hi @haruosuz, we are currently testing and reviewing the new release before rolling out the updated database. In the upcoming release (still to be published in 2020), we fixed the duplicated entries.

Thanks again for reporting the issue!

Xethic commented 3 years ago

Hey @haruosuz ! Happy to announce that we've rolled out the latest update of PLSDB! This release should also resolve the duplicated entries. We would be happy to receive your feedback. Thanks!