Clinical-Genomics / BALSAMIC

Bioinformatic Analysis pipeLine for SomAtic Mutations In Cancer
https://balsamic.readthedocs.io/
MIT License
44 stars 16 forks source link

New PON reference for GMCKsolid panel #836

Closed ashwini06 closed 2 years ago

ashwini06 commented 2 years ago

Is your feature request related to a problem? Please describe. The idea is to generate a PON reference using all the normal blood samples from the GMCKsolid panel that was sequenced earlier here at clinical genomics.

Describe the solution you'd like With the search criteria by filtering samples based on target panel 'GMCKsolid' and cases with Tumor-Normal samples [(irrespective of customer IDs)], and those samples where normal samples are of blood origin. This filter search gave us ~70 samples. Out of ~70 samples, 53 samples can be used for creating GMCKsolid PON , and after removing duplicate sample ID 26 samples can be used for PON generation

GMCKsolid_normalblood_custID_VW.xlsx

Expected output for the feature Using PON workflow generate a PON reference : GMCKsolid_caseid_PON_reference.cnn

ashwini06 commented 2 years ago

solution from @karlnyr cg commands to link multiple fastq files to single case-id

[0|0|0] 10d [hiseq.clinical@hasta:~] [P_main] 19s 2 $ cg add family --priority standard -p OMIM-AUTO -a balsamic -dd scout cust000 panel_of_normal_20211222
givingcobra: new case added
[0|0|0] θ60° 10d [hiseq.clinical@hasta:~] [P_main] 22s $ for sample in `awk '$1 !~ /sample_id/ {print $1}' /home/proj/long-term-stage/cancer/PON_analysis_runs_APJ/GMCKsolid_normalblood_custID_VW.txt | uniq`; do cg add relationship -s unknown givingcobra $sample; done
cg workflow balsamic link givingcobra

But sometimes the files will be decompressed the above command retrieves error like

cg workflow balsamic link givingcobra
Case givingcobra exists in status db
Fetching files from bundle ACC6516A1
Fetching files with tags in [fastq]
Concatenation in progress for sample ACC6516A1.
Concatenating:  ->
Traceback (most recent call last):
  File "/home/proj/bin/conda/envs/P_main/bin/cg", line 8, in <module>
    sys.exit(base())
  File "/home/proj/bin/conda/envs/P_main/lib/python3.7/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/home/proj/bin/conda/envs/P_main/lib/python3.7/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/home/proj/bin/conda/envs/P_main/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/proj/bin/conda/envs/P_main/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/proj/bin/conda/envs/P_main/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/proj/bin/conda/envs/P_main/lib/python3.7/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/proj/bin/conda/envs/P_main/lib/python3.7/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/home/proj/bin/conda/envs/P_main/lib/python3.7/site-packages/click/decorators.py", line 27, in new_func
    return f(get_current_context().obj, *args, **kwargs)
  File "/home/proj/bin/conda/envs/P_main/lib/python3.7/site-packages/cg/cli/workflow/commands.py", line 70, in link
    analysis_api.link_fastq_files(case_id=case_id)
  File "/home/proj/bin/conda/envs/P_main/lib/python3.7/site-packages/cg/meta/workflow/balsamic.py", line 129, in link_fastq_files
    case_obj=case_obj, sample_obj=link.sample, concatenate=True
  File "/home/proj/bin/conda/envs/P_main/lib/python3.7/site-packages/cg/meta/workflow/analysis.py", line 304, in link_fastq_files_for_sample
    self.fastq_handler.concatenate(linked_reads_paths[read], concatenated_paths[read])
  File "/home/proj/bin/conda/envs/P_main/lib/python3.7/site-packages/cg/meta/workflow/fastq.py", line 37, in concatenate
    with open(concat_file, "wb") as write_file_obj:
FileNotFoundError: [Errno 2] No such file or directory: ''

To decompress and link files run cg workflow balsamic start givingcobra -r

This command won't run the balsamic analysis as there are more than one normal sample and returns the following error

ACC8839A2            normal               tgs                  gmcksolid_4.1_hg19_design.bed

Could not create config: Invalid number of normal samples: 26, only up to 1 allowed!!
Aborted!

But now we have all fastq files linked to single case'iD /home/proj/production/cancer/cases/givingcobra/fastq

ivadym commented 2 years ago

Closing since there is nothing else that has to be done on Balsamic side. Will be addressed in CG: https://github.com/Clinical-Genomics/cg/issues/1522.