diskin-lab-chop / AutoGVP

19 stars 3 forks source link

fix submission file loading bug #218

Closed rjcorb closed 9 months ago

rjcorb commented 9 months ago

Purpose/implementation Section

What feature is being added or bug is being addressed?

Closes #216. This PR addresses a bug in select-ClinVar-submissions.R script that was resulting in the contents of some rows being ignored.

What was your approach?

I removed the comment = "#" argument from vroom() when reading in submission_summary.txt.gz. In its place, we determine the number of lines to skip using a while() statement that reads file lines until # is not found at start of line. This number is assigned to variable skip_lines, and is assigned as value of skip in vroom() function.

What GitHub issue does your pull request address?

216

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

Please review updated code and ensure script runs successfully:

Rscript scripts/select-clinVar-submissions.R --variant_summary data/variant_summary.txt.gz --submission_summary data/submission_summary.txt.gz --outdir ../bug_fix --conceptID_list data/clinvar_all_disease_concept_ids.txt --conflict_res "latest"`

I have tested the new function and it results in all rows of submission_summary.txt.gz being loaded in their entirety. This error was found because we were getting an incorrect resolution for variant 186251 (previously called LP), but it is now called VUS as expected.

Is there anything that you want to discuss further?

No