diskin-lab-chop / AutoGVP

17 stars 3 forks source link

Rjcorb/234 update variant ids col #236

Closed rjcorb closed 5 months ago

rjcorb commented 5 months ago

Purpose/implementation Section

What feature is being added or bug is being addressed?

Closes #234. This PR renames the rs_id column as variant_ids to reflect cases where multiple variant ID sources are listed. The & separator is also replaced with ;

What was your approach?

What GitHub issue does your pull request address?

234

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

Please run the custom test samples through AutoGVP , and confirm variant_id column is present and, when applicable, that variant ids are separated by ;

bash run_autogvp.sh --workflow="custom" \
--vcf=data/test_VEP.vcf \
--clinvar=data/clinvar.vcf.gz \
--intervar=data/test_VEP.hg38_multianno.txt.intervar \
--multianno=data/test_VEP.vcf.hg38_multianno.txt \
--autopvs1=data/test_autopvs1.txt \
--outdir=results \
--out="test_custom"

Is there anything that you want to discuss further?

No

Documentation Checklist

jharenza commented 5 months ago

I had to clone the repo from scratch, but running the command above gives an error:

(base) ubuntu@X:~/AutoGVP$ bash run_autogvp.sh --workflow="custom"
.
select ClinVar submission file not specified. Running select-ClinVar-submissions Rscript...
variant summary and/or submission_summary file(s) not specified. Checking if files exist in data/...
variant_summary and/or submission_summary files not found. Downloading latest versions from ClinVar...
Warning: Illegal date format for -z, --time-cond (and not a file name). 
Warning: Disabling time condition. See curl_getdate(3) for valid date syntax.
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   166  100   166    0     0   2721      0 --:--:-- --:--:-- --:--:--  2721
Downloading clinvar.vcf.gz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 77.8M  100 77.8M    0     0  59.6M      0  0:00:01  0:00:01 --:--:-- 59.6M
Downloading submission_summary.txt.gz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  190M  100  190M    0     0  46.1M      0  0:00:04  0:00:04 --:--:-- 46.1M
Downloading variant_summary.txt.gz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  200M  100  200M    0     0  63.2M      0  0:00:03  0:00:03 --:--:-- 63.2M
Checking MD5 hashes...
clinvar.vcf.gz: OK
submission_summary.txt.gz: OK
variant_summary.txt.gz: OK
resolving ClinVar conflicts using default parameters...
run_autogvp.sh: line 143: Rscript: command not found
(base) ubuntu@X:~/AutoGVP$ --vcf=data/test_VEP.vcf
-bash: --vcf=data/test_VEP.vcf: No such file or directory
(base) ubuntu@X:~/AutoGVP$ --clinvar=data/clinvar.vcf.gz
-bash: --clinvar=data/clinvar.vcf.gz: No such file or directory
(base) ubuntu@X:~/AutoGVP$ --intervar=data/test_VEP.hg38_multianno.txt.intervar
-bash: --intervar=data/test_VEP.hg38_multianno.txt.intervar: No such file or directory
(base) ubuntu@X:~/AutoGVP$ --multianno=data/test_VEP.vcf.hg38_multianno.txt
-bash: --multianno=data/test_VEP.vcf.hg38_multianno.txt: No such file or directory
(base) ubuntu@X:~/AutoGVP$ --autopvs1=data/test_autopvs1.txt
-bash: --autopvs1=data/test_autopvs1.txt: No such file or directory
(base) ubuntu@X:~/AutoGVP$ --outdir=results
--outdir=results: command not found
jharenza commented 5 months ago

actually - scratch that, I needed to make the command one line from above since it did not have the line separators and it is running now

naqvia commented 5 months ago

It ran to completion, and I can confirm there's no longer & symbols, but agree with above comments.

rjcorb commented 5 months ago

@jharenza @naqvia the requested updates have been implemented in the latest commits; let me know what you think. I am a but confused by some of the nomenclature of the rsIDs in these test files--some start with "ss" instead of "rs". I've generally been avoiding using these files since they were somewhat manually generated by Jung, but our other pbta test set didn't have any instances of multiple rsIDs so would not have been useful in this case.