Closed ARWattam closed 8 years ago
1224148.3 Dickeya chrysanthemi NCPPB 3533 1223569.3 Dickeya chrysanthemi NCPPB 402 1223571.3 Dickeya chrysanthemi NCPPB 516 1224149.3 Dickeya dadantii NCPPB 3537 1223572.3 Dickeya dadantii NCPPB 898 1223574.3 Dickeya dadantii subsp. dadantii NCPPB 2976 1226343.9 Dickeya dianthicola GBBC 2039 1223570.4 Dickeya dianthicola NCPPB 3534 1223568.4 Dickeya dianthicola NCPPB 453 1226344.5 Dickeya solani GBBC 2040 1225786.1 Dickeya solani IPO 2222 1224151.3 Dickeya solani MK10 1224152.3 Dickeya solani MK16 1224144.3 Dickeya sp. CSL RW240 1225785.3 Dickeya sp. DW 0440 1224145.3 Dickeya sp. MK7 568766.3 Dickeya sp. NCPPB 3274 568768.3 Dickeya sp. NCPPB 569 1223567.3 Dickeya zeae CSL RW192 1224153.3 Dickeya zeae MK19 1223573.3 Dickeya zeae NCPPB 2538 1224146.3 Dickeya zeae NCPPB 3531 1224147.4 Dickeya zeae NCPPB 3532
Here are some more that I found. I think that most will turn out to be bad (too long or too short as well): 1223567.3 1223568.4 1223569.3 1223570.4 1223571.3 1223572.3 1223573.3 1223574.3 1224144.3 1224145.3 1224146.3 1224147.4 1224148.3 1224149.3 1224151.3 1224152.3 1224153.3 1225785.3 1225786.3 1226343.9 1226344.5 568766.3 568768.3 999430.4 999431.3 999434.3 1312920.4 1321951.4 1239783.3 139.77 698953.3 1302421.3 1302425.3 1302427.3 1435053.3 1491.410 1491.411 1491.412 1491.413 1491.414 1491.415 1491.416 1491.417 1491.418
I have deleted all Dickeya genomes reported by the user and other bad genomes reported by Jim from the database.
Except Clostridium genomes in Jim's list. Some of them are much longer than other clostrodium genomes. However, they match with corresponding genomes at NCBI. That means the version we have in PATRIC may be a low quality sequence, but it is not corrupted.
The correct version of the deleted genomes will be added soon.
-Maulik
What is the cause of the bad/corrupted sequences? If we have found quite a few in the course of testing / doing work, there are likely many others, due to some systemic problem.
Ron
On 11/19/16 9:14 PM, Maulik Shukla wrote:
I have deleted all Dickeya genomes reported by the user and other bad genomes reported by Jim from the database.
Except Clostridium genomes in Jim's list. Some of them are much longer than other clostrodium genomes. However, they match with corresponding genomes at NCBI. That means the version we have in PATRIC may be a low quality sequence, but it is not corrupted.
The correct version of the deleted genomes will be added soon.
-Maulik
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/PATRIC3/patric3_website/issues/1273#issuecomment-261753534, or mute the thread https://github.com/notifications/unsubscribe-auth/ADCnWmio9i0mR3wlwsGOmL7EP0Tj3tssks5q_60OgaJpZM4K0OuX.Web Bug from https://github.com/notifications/beacon/ADCnWgTVqWHbryRIMnuWcDDbCLgy0pGZks5q_60OgaJpZM4K0OuX.gif
Ron Kenyon PATRIC Project Manager, patricbrc.org Project Director, Biocomplexity Institute Virginia Tech rkenyon@vbi.vt.edu
Ron,
These problematic genomes were reported by Jim as part of the genome QC matrix he is working on, before they were reported by the users. There were 31 genomes in total (including 23 Dickeya) that were identified to be problematic and deleted from the database.
Most of these genomes were draft genomes in multiple contigs that got completed into single chromosomes later. We had both versions merged as a single genome in the database. This was an artifact from the old data processing workflows in PATRIC2.
The new workflow for incorporating new genomes relies on NCBI assembly database and has multiple checks to make sure the same genome doesn’t get loaded multiple times.
Any time we find a problematic case, I usually run a check on the entire database to find other similar cases. The same checks are then run periodically.
-Maulik
On Nov 21, 2016, at 6:40 AM, rkenyon notifications@github.com wrote:
What is the cause of the bad/corrupted sequences? If we have found quite a few in the course of testing / doing work, there are likely many others, due to some systemic problem.
Ron
On 11/19/16 9:14 PM, Maulik Shukla wrote:
I have deleted all Dickeya genomes reported by the user and other bad genomes reported by Jim from the database.
Except Clostridium genomes in Jim's list. Some of them are much longer than other clostrodium genomes. However, they match with corresponding genomes at NCBI. That means the version we have in PATRIC may be a low quality sequence, but it is not corrupted.
The correct version of the deleted genomes will be added soon.
-Maulik
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/PATRIC3/patric3_website/issues/1273#issuecomment-261753534, or mute the thread https://github.com/notifications/unsubscribe-auth/ADCnWmio9i0mR3wlwsGOmL7EP0Tj3tssks5q_60OgaJpZM4K0OuX.Web Bug from https://github.com/notifications/beacon/ADCnWgTVqWHbryRIMnuWcDDbCLgy0pGZks5q_60OgaJpZM4K0OuX.gif
Ron Kenyon PATRIC Project Manager, patricbrc.org Project Director, Biocomplexity Institute Virginia Tech rkenyon@vbi.vt.edu
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/PATRIC3/patric3_website/issues/1273#issuecomment-261926328, or mute the thread https://github.com/notifications/unsubscribe-auth/ABLd73YNWsmgFaIcM62siS_KfGfj-YOsks5rAZE1gaJpZM4K0OuX.
OK, that makes sense to me. I had mentioned the issue to Tom during the business call this morning, so he may inquire about it.
Thanks, Ron
On 11/21/16 9:09 AM, Maulik Shukla wrote:
Ron,
These problematic genomes were reported by Jim as part of the genome QC matrix he is working on, before they were reported by the users. There were 31 genomes in total (including 23 Dickeya) that were identified to be problematic and deleted from the database.
Most of these genomes were draft genomes in multiple contigs that got completed into single chromosomes later. We had both versions merged as a single genome in the database. This was an artifact from the old data processing workflows in PATRIC2.
The new workflow for incorporating new genomes relies on NCBI assembly database and has multiple checks to make sure the same genome doesn’t get loaded multiple times.
Any time we find a problematic case, I usually run a check on the entire database to find other similar cases. The same checks are then run periodically.
-Maulik
On Nov 21, 2016, at 6:40 AM, rkenyon notifications@github.com wrote:
What is the cause of the bad/corrupted sequences? If we have found quite a few in the course of testing / doing work, there are likely many others, due to some systemic problem.
Ron
On 11/19/16 9:14 PM, Maulik Shukla wrote:
I have deleted all Dickeya genomes reported by the user and other bad genomes reported by Jim from the database.
Except Clostridium genomes in Jim's list. Some of them are much longer than other clostrodium genomes. However, they match with corresponding genomes at NCBI. That means the version we have in PATRIC may be a low quality sequence, but it is not corrupted.
The correct version of the deleted genomes will be added soon.
-Maulik
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub
https://github.com/PATRIC3/patric3_website/issues/1273#issuecomment-261753534,
or mute the thread
Bug from
https://github.com/notifications/beacon/ADCnWgTVqWHbryRIMnuWcDDbCLgy0pGZks5q_60OgaJpZM4K0OuX.gif
Ron Kenyon PATRIC Project Manager, patricbrc.org Project Director, Biocomplexity Institute Virginia Tech rkenyon@vbi.vt.edu
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/PATRIC3/patric3_website/issues/1273#issuecomment-261926328, or mute the thread https://github.com/notifications/unsubscribe-auth/ABLd73YNWsmgFaIcM62siS_KfGfj-YOsks5rAZE1gaJpZM4K0OuX.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/PATRIC3/patric3_website/issues/1273#issuecomment-261947669, or mute the thread https://github.com/notifications/unsubscribe-auth/ADCnWmtE-d-MjuxnlrF4vfJ_YdcTFDmxks5rAaYUgaJpZM4K0OuX.Web Bug from https://github.com/notifications/beacon/ADCnWvidFTbuEIY2GT5fcLcFYJY_OKmrks5rAaYUgaJpZM4K0OuX.gif
Ron Kenyon PATRIC Project Manager, patricbrc.org Project Director, Biocomplexity Institute Virginia Tech rkenyon@vbi.vt.edu
We got a user ticket about this, and just talked to Jim and he confirms that this is a problem. This could be an indication of other problems that we don't know about.