R22 Curator Testing - segregation changes

ClinGen / clincoded

This GCI/VCI 1.0 platform has now been retired, and replaced with our new 2.0 platform:

https://github.com/ClinGen/gene-and-variant-curation-tools/issues

MIT License

25 stars 9 forks source link

R22 Curator Testing - segregation changes #1763

Closed wrightmw closed 6 years ago

wrightmw commented 6 years ago

The upcoming R22 release will contain multiple changes to reflect the recent update to segregation scoring in the Gene-Disease Clinical Validity SOP: https://www.clinicalgenome.org/curation-activities/gene-disease-validity/educational-and-training-materials/standard-operating-procedures/

These changes involved updates to both the GCI and to the summaries shown on the ClinGen website (clinicalgenome.org), so these new functionalities need to be tested extensively by curators. Especially making sure that all fields in the GCI Classification Matrix are appearing correctly and as expected in the website Summaries.

Tickets associated with this testing include: #1549 #1550 and item 2 of "required tasks" in ticket #1712

wrightmw commented 6 years ago

While you are testing it, please give it some pounding throughout the process. It would be good to test it with as many different scenarios you can think of to make sure all of them look right and are functioning correctly.

Please look at overall functionality and workflow, but specifically also make sure to look at:

1) editing existing Family evidence

make sure it has reverted to candidate (this is the default for all existing evidence)
make sure you can edit it and save it
make sure you can change it to exome from candidate 2) adding new Family evidence
make sure you can add it and save it
make sure you can save it as exome or candidate 3) viewing the Classification matrix
after you save changes to the edited existing and/or new evidence make sure all the scores and points in the classification matrix are correct
upon saving your new Classification matrix, make sure the new evidence summary shows all the edited existing and/or new evidence changes you've saved
when you save you make a note of the timestamp for saving the matrix, you have to make sure this matches correctly on the website when you publish it 4) when approving your Classification
try approving without adding a manual Approved date and also with adding a manual Approved date (you'll need to publish both ways, with and without the manually added Approval so that you can compare these to the website to see the dates are correct for both scenarios) 5) viewing the Summary on the website after publishing (Note: this is a test version of the website, containing production data, this is not the real website)
make sure the scores/counts on the website match those in your Classification matrix in the GCI
make sure the dates match: -"Last Saved Summary Classification" (bottom of Classification matrix in GCI) = CALCULATED CLASSIFICATION (DATE) on website
if you modified the classification in the Matrix in the GCI = MODIFIED CLASSIFICATION (DATE) on website
the automatic date of approval in the GCI will be default, unless you manually added an Approval date, in which case that should appear in preference = EXPERT CURATION (DATE) on the website

wrightmw commented 6 years ago

@courtneythaxton @ErinRiggs @marinadistefano @jennygoldstein

Here is a test instance containing production data for you to test the functionality of the GCI segregation scoring based on the latest changes to the SOP:

https://publish-testing-int.demo.clinicalgenome.org/

While you are testing it, please give it some pounding throughout the process, including: curating the Family evidence (both adding new Family evidence and editing existing Family evidence) viewing the Classification matrix (especially making sure all the scores and points are correct for all possible scenarios) saving the Classification viewing the evidence summary viewing all the segregation data and dates on the website (Note: this is a test version of the website, this is not the real website and so you are safe to publish whatever you wish onto this test site)

For guidance on some things to look at while testing, please see the panel above this one.

Please feel free to pass this instance on to anyone you feel would add value to this testing process. We really need this new segregation scoring to be tested with as many different potential scenarios as possible to make sure it is functioning correctly.

NOTE: Please put all of your comments into this ticket in GitHub so that our devs an see them while I am on vacation over the next week, and can start working on any required fixes.

Also, we are aware of one bug whereby some decimal points are not showing correctly on the website (it added lots of zeroes after the decimal point). We almost have a fix for this, and so it may already be fixed by the time you look at the website!

Thank you in advance for your help in testing this complex feature!

wrightmw commented 6 years ago

@courtneythaxton @ErinRiggs @marinadistefano @jennygoldstein

Guys, one last thing that I missed off the bottom of that last message. Please finish your testing by CoB on Monday Sept 10th. Thank you!

courtneythaxton commented 6 years ago

Hi All,

I have gone through a little bit of the segregation demo and have found some issues. Of note, I have not fully tested under the time frame given, but I think that it would be good for us to have a few more days or so to sort out and test fully the demo.

Here is what I have found:

(1) I noticed that the website display does not show all the PMIDs associated with the segregation scoring. For instance, I tried the VPS13B:cohen syndrome. There is candidate and exome sequencing for this curation. The candidate sequencing comes from the following articles: (a) Mochida PMID: 15173253, (b) Rafiq PMID: 26104215; and the exam comes from Rafiq PMID: 26104215. The website display only shows evidence for the candidate sequencing and only the Rafiq PMID: 26104215, whereas the exam shows nothing. Link: https://search-staging.clinicalgenome.org/kb/gene-validity/ccfc68bc-9f00-4565-b2c7-70ad9f8e2002--2018-09-10T18:18:25

After talking with the Gene Curation Small working group, we believe that the data exchange must be capturing the SEgregation section for “Genetic Evidence: Case Level (family segregation information without proband data or scored proband data).” Instead of pulling the segregation LOD score PMIDs from the case-level data with probands (see attached evidence summaries).

(1A) The same was true for the UBE2A: syndromic X-linked: https://search-staging.clinicalgenome.org/kb/gene-validity/94765cdb-fe12-4098-b28e-20cd58744271--2018-09-10T19:06:37 this actually did not show any of the PMIDS for either category for segregation, even thought there was 1 for candidate and 2 for exome. Also see the attached evidence summary from GCI and website entry.

(2) I noticed when you "unpublish" and article the segregation entries for the LOD scores shows NaN (for each category, i.e. candidate and exam), but the total LOD score shows the total score. When I don't change anything but go to re-publish the curation, no information for the individual categories is shown, neither the score for each category, nor the PMIDS. For example see the UBE2A: syndromic X-linked ID: https://search-staging.clinicalgenome.org/kb/gene-validity/94765cdb-fe12-4098-b28e-20cd58744271--2018-07-02T16:00:00 Because I have not changed anything, I assume this is the issue, but by default these should be chosen as candidate sequencing and should show the 5.24 in candidate, which it did show as the approved published before I unpublished to change.

Unfortunately I did not capture images of the NaN from UBE2A, but was able to capture from PAK3 (see below). Interestingly, if I do not alter anything after the unpublish, and re-publish, I still get the NaN in the segergation (see below PAK3 re-publish classification summary). Equall, the website display fails to show any numbers coordinating to the segregation evidence categories, and no PMIDS (see image).

Of note, the provisional and approval steps have become a little clunky; I"m not sure if that is from the demo or not, but they take quite awhile.

Best, Courtney

Courtney Thaxton, Ph.D. ClinGen Senior Biocurator Berg Lab, UNC Dept. of Genetics 120 Mason Farm Road 5100 B, Genetic Medicine Building CB#7264 University of North Carolina, Chapel Hill, NC 27599 919-966-9562

[cid:e58b8f4b-706f-484c-9bb6-a1ec5ac8cf07@namprd03.prod.outlook.com] [cid:4a6f6f11-61dc-4460-9a72-5b36d477b0bd@namprd03.prod.outlook.com]

[cid:dd56c5bc-bc2a-4889-9182-65877b534110@namprd03.prod.outlook.com]

[cid:372f5a2c-4635-4ee8-8640-b581ca7e8a57@namprd03.prod.outlook.com]

[cid:53df558b-fe3c-44be-b18f-9b87b4fd92e5@namprd03.prod.outlook.com]

On Sep 7, 2018, at 8:19 PM, Matt W. Wright notifications@github.com<mailto:notifications@github.com> wrote:

@courtneythaxtonhttps://github.com/courtneythaxton @ErinRiggshttps://github.com/ErinRiggs @marinadistefanohttps://github.com/marinadistefano @jennygoldsteinhttps://github.com/jennygoldstein

Guys, one last thing that I missed off the bottom of that last message. Please finish your testing by CoB on Monday Sept 10th. Thank you!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ClinGen/clincoded/issues/1763#issuecomment-419597477, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AfBeVYkP8vJoSJ9Mb4idKokH9dK294muks5uYw0bgaJpZM4WbLxx.

courtneythaxton commented 6 years ago

I’ll check out the UI items! Scott

On Sep 10, 2018, at 4:11 PM, Thaxton, Courtney Lynn courtney_thaxton@med.unc.edu wrote:

Hi All,

I have gone through a little bit of the segregation demo and have found some issues. Of note, I have not fully tested under the time frame given, but I think that it would be good for us to have a few more days or so to sort out and test fully the demo.

Here is what I have found:

(1) I noticed that the website display does not show all the PMIDs associated with the segregation scoring. For instance, I tried the VPS13B:cohen syndrome. There is candidate and exome sequencing for this curation. The candidate sequencing comes from the following articles: (a) Mochida PMID: 15173253, (b) Rafiq PMID: 26104215; and the exam comes from Rafiq PMID: 26104215. The website display only shows evidence for the candidate sequencing and only the Rafiq PMID: 26104215, whereas the exam shows nothing. Link: https://search-staging.clinicalgenome.org/kb/gene-validity/ccfc68bc-9f00-4565-b2c7-70ad9f8e2002--2018-09-10T18:18:25

After talking with the Gene Curation Small working group, we believe that the data exchange must be capturing the SEgregation section for “Genetic Evidence: Case Level (family segregation information without proband data or scored proband data).” Instead of pulling the segregation LOD score PMIDs from the case-level data with probands (see attached evidence summaries).

(1A) The same was true for the UBE2A: syndromic X-linked: https://search-staging.clinicalgenome.org/kb/gene-validity/94765cdb-fe12-4098-b28e-20cd58744271--2018-09-10T19:06:37 this actually did not show any of the PMIDS for either category for segregation, even thought there was 1 for candidate and 2 for exome. Also see the attached evidence summary from GCI and website entry.

(2) I noticed when you "unpublish" and article the segregation entries for the LOD scores shows NaN (for each category, i.e. candidate and exam), but the total LOD score shows the total score. When I don't change anything but go to re-publish the curation, no information for the individual categories is shown, neither the score for each category, nor the PMIDS. For example see the UBE2A: syndromic X-linked ID: https://search-staging.clinicalgenome.org/kb/gene-validity/94765cdb-fe12-4098-b28e-20cd58744271--2018-07-02T16:00:00 Because I have not changed anything, I assume this is the issue, but by default these should be chosen as candidate sequencing and should show the 5.24 in candidate, which it did show as the approved published before I unpublished to change.

Unfortunately I did not capture images of the NaN from UBE2A, but was able to capture from PAK3 (see below). Interestingly, if I do not alter anything after the unpublish, and re-publish, I still get the NaN in the segergation (see below PAK3 re-publish classification summary). Equall, the website display fails to show any numbers coordinating to the segregation evidence categories, and no PMIDS (see image).

Of note, the provisional and approval steps have become a little clunky; I"m not sure if that is from the demo or not, but they take quite awhile.

Best, Courtney

Courtney Thaxton, Ph.D. ClinGen Senior Biocurator Berg Lab, UNC Dept. of Genetics 120 Mason Farm Road 5100 B, Genetic Medicine Building CB#7264 University of North Carolina, Chapel Hill, NC 27599 919-966-9562
> On Sep 7, 2018, at 8:19 PM, Matt W. Wright wrote: > > @courtneythaxton @ErinRiggs @marinadistefano @jennygoldstein > > Guys, one last thing that I missed off the bottom of that last message. Please finish your testing by CoB on Monday Sept 10th. Thank you! > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub, or mute the thread. >

jimmyzhen commented 6 years ago

Hi @courtneythaxton,

After talking with the Gene Curation Small working group, we believe that the data exchange must be capturing the SEgregation section for “Genetic Evidence: Case Level (family segregation information without proband data or scored proband data).” Instead of pulling the segregation LOD score PMIDs from the case-level data with probands (see attached evidence summaries).

If I am not mistaken, the GCI is passing the PMID data (to the website) associated with the family segregation as long as its LOD score is included in the classification, regardless the presence of proband data. And the website would just display the PMID data as provided. @bryanwulf and @sgoehringer can help to weigh in on this.

I noticed when you "unpublish" and article the segregation entries for the LOD scores shows NaN (for each category, i.e. candidate and exam), but the total LOD score shows the total score.

Is this behavior happening in GCI's Classification Matrix or on the website?

Of note, the provisional and approval steps have become a little clunky; I"m not sure if that is from the demo or not, but they take quite awhile.

Cluncky in the context of workflow or in the context of responsiveness? If the issue is performance, it is likely due to the demo instance that uses a lower tier of computing configurations than we do in production.

courtneythaxton commented 6 years ago

Hi @jimmyzhen,

Thanks for the feedback and questions.

@sgoehringer is indicated he is working on the UI for the PMID display for the website, which is great.

The NaN is showing up on the GCI classification matrix. I took screen shots of the issue for the PAK3 curation (attached). I have outlined the images with red circles, and text in pink where applicable. There is an image for the original published GCI classification matrix (PAK3 published classification summary.png), the unpublished display with NaN on the GCI classification matrix (PAK3 unpublished classification summary.png), the republished GCI classification matrix in which I made no changes to segregation and just hit the publish button on the GCI approval again (PAK3 re-published classification summary.png), and lastly the re-published PAK3 curation on the website (PAK3 republish website display.png).

The clunky was referring to the responsiveness of the demo GCI site when going through the provisional and final approval, as well as the publish. There was quite a lag when completing each stage. Thank you for the feedback on this and clarification.

pak3 re-published classification summary

sgoehringer commented 6 years ago

@bryanwulf would it be possible to review the data for VPS13B. I've been working with @courtneythaxton and it has been determined that one PMID should show for Exome (Rafiq PMID: 26104215) and 2 for candidate (Rafiq PMID: 26104215 and Mochida PMID: 15173253) in the segregation area. I checked the data that I have and I see (below) which doesn't include everything she was expecting. Can you review? +++++++++++++++++++++++ iri = ccfc68bc-9f00-4565-b2c7-70ad9f8e2002--2018-09-10T18:18:25 perm_id = ccfc68bc-9f00-4565-b2c7-70ad9f8e2002--2018-09-10T18:18:25 +++++++++++++++++++++++

“CaseLevelData”: { “SegregationEvidence”: { “ExomeSequencingMethod”: { “SummedLod”: 1.45, “FamilyCount”: 1 }, “PointsCounted”: 1.9, “CandidateSequencingMethod”: { “SummedLod”: 3.86, “FamilyCount”: 3, “Evidence”: { “Publications”: [ { “pubdate”: “2015 Jun 25”, “source”: “BMC medical genetics”, “author”: “Rafiq MA”, “title”: “Novel VPS13B Mutations in Three Large Pakistani Cohen Syndrome Families Suggests a Baloch Variant with Autistic-Like Features.“, “pmid”: “26104215" } ] } }, “TotalPoints”: 5.31 },

bryanwulf commented 6 years ago

@courtneythaxton @sgoehringer Courtney and the Gene Curation Small working group were on the right track. The code that sends the message/data to the Data Exchange was excluding segregation evidence (PMIDs) when the corresponding family included a scored proband. The goal of this logic, which I've now removed, was to generate a website summary that closely matched the evidence summary in the GCI (that the list of PMIDs in "Segregation Evidence" on the website match the list under "Genetic Evidence: Case Level (family segregation information without proband data or scored proband data)" in the GCI).

To test the updated code, one should only need to unpublish and republish an existing classification. It may take a few minutes for the website to display the changes, but these steps worked for me.

I'm still investigating the NaN issue...

bryanwulf commented 6 years ago

@courtneythaxton I think the appearance of NaN is a result of viewing an evidence summary generated from data in an older model (prior to the segregation scoring changes). For instance, looking at another curation (for OFD1) that hasn't had any activity in the test instance: https://publish-testing-int.demo.clinicalgenome.org/provisional-curation/?gdm=1d6fa43a-afc2-4b66-b892-51a944cd28a4&edit=yes

when you click on any of the buttons to view an approved/provisional summary, you'll see NaN in the matrix on the resulting page. The matrix on the provided link/page appears correct (no NaN) because it's being generated from the updated data model (where any existing segregation scoring has been placed in the candidate category).

If you're looking to publish a classification with updated segregation scoring, you'll need to start with (re)saving the classification (going through the entire approval process).

courtneythaxton commented 6 years ago

Thank @bryanwulf for checking on this NaN, and providing clarification.

I think it will be important for us (either/or GCI and the Gene Curation Working group) to specify how curators should go about updating the information for segregation. So if I have this correct, all curations with segregation needing changes, will need to first unpublish (if necessary), then adjust the segregation scoring, create a new provisional classification, approve the provisional, and then publish again. Is this correct?

My next question, concerns curations with segregation evidence that does not need to be changed, i.e. all the evidence is for candidate sequencing (which is the default setting for this release if I remember correctly). In this situation, does the curator/coordinator need to perform any changes or tasks in the GCI? Or will the information stay published to the website as is, but with the new visualization?

@jennygoldstein @erinriggs @marinadistefano @wrightmw @sgoehringer

bryanwulf commented 6 years ago

@courtneythaxton I can answer your questions from a technical perspective, but @wrightmw (when he returns from vacation) may have a different opinion, coming from a curator/user perspective.

Your steps for republishing a classification in order to include updated segregation scoring are correct. The first step - unpublishing the existing classification - is optional. If it's not done, when the newly-approved classification is published, the status of the existing one will automatically change from published to unpublished.

As to the classification's appearance on the website, if the Data Exchange isn't notified that the segregation scoring has changed (via the republish steps above), the existing visualization/matrix will remain (based on non-categorized segregation scoring).

wrightmw commented 6 years ago

Hi @courtneythaxton I think you are totally correct that we (the GCI and the Gene Curation Working group) need to specify how curators should go about updating the information for segregation.

Each time a curator clicks the Publish button all the relevant data about that classification is sent via the Data Exchange to the website for display. If there is already a classification on the website then it gets automatically replaced by the latest published version.

If it's a classification that has been affected by the segregation changes then the curator should assess the changes (candidate vs exome), check the new scoring, save a new Classification Matrix, create a new Provisional, create new Approved, and than publish it to the website. Again, publishing to the website will automatically unpublish the current website classification and replace it with the new one.

All the current Provisional and Approved snapshots were saved based on the old matrix and segregation scores. IMHO, curators should not "republish" (by unpublishing and publishing again) the old approved snapshots because they are not compatible with the new scoring. In effect, they would be sending an old format classification to the website to be displayed in the new format. Therefore, I think "republishing" should probably be disabled for all approved classifications that are in the old format. Until a curator publishes a new classification to the website (or unless they choose to unpublish it), then the old classification with the old formatting will remain in place. If they want their website classification to be replaced with one in the new format then they will need to publish a new approved classification (by saving a new provisional/approved) that has the new formatting.

@jimmyzhen @bryanwulf @jennygoldstein @ErinRiggs @marinadistefano @sgoehringer

jimmyzhen commented 6 years ago

Included in the R22 release. Thank you.