Closed hassanfa closed 2 years ago
ongoing discussion with @keyvanelhami to use one of our customer's normal samples as panel of normals.
I'd like to also be able to create a case with NO tumor sample in it, but link it all in balsamic directory.
Could you describe a little more how you imagine input and output to look like?
Could it be something like:
cg workflow balsamic normal-pool --sample sample-id1 --sample sample-id2 ... --output path/to/analysis/dir/
Or what do you have in mind?
These pools of normal samples are >20 samples. I imagine typing it all can be problematic. If we create a case with all the samples we want, Will the following command be too ambitious?
cg workflow balsamic normal-pool <case_name>
These samples won't be stored in housekeeper, cause the analysis result will be added either to small reference
repo or target_capture_bed
repo.
Ok, will there be many differents pools or just a few that are being reused? If there are many I suggest that we implement so that the cli call can take a file with sample ids as argument. That would be a flexible way if creating new pools and change the pools. Something like:
$cat pool1.csv
sample1
sample2
...
$cg workflow balsamic normal-pool --samples pool1.csv --output /path/to/dir/
These samples won't be stored in housekeeper, cause the analysis result will be added either to small reference repo or target_capture_bed repo.
Do you mean that the samples don't exist in housekeeper at all? I assume that the system will know about the samples otherwise it will be tricky.
Hi! Is this feature something that will be routinely used in production, or a research venture? If its not yet a routine analysis, perhaps we could build a specialized script or package to link files and create config. Currently our setup for running balsamic in production is tailored after a specified definition for what a case and its config file should look like.
Csv file is just fine. It is actually easier, we can query statusdb separ and create list of samples that we would like to use. And we (cancer bioinfo) can store these in a repository for the record.
I mean the final result will not be stored in the housekeeper. The final results are two tsv files, and possibly one vcf.
It will be for production and routine analysis.
Ok then I think I understand. We can consider if this should be a separate service or something that is included in the CG codebase. That choice should not affect the final thing so much.
Production will use it to generate pool of normal results. I think the solution should definitely consider that. Wherever it is, it should be easily accessible for them.
bump
Hi! You cant do the linking part already!
We can automate this later, but will need a more clearly defined worklow and a project for this!
Update from cancer team:
@Mropat: I see a lot of conversations happened in the past. May I know is it possible to link multiple fastq files from different cases to one caseid now?
Right now, I have around 50 samples (/home/proj/long-term-stage/cancer/PON_analysis_runs_APJ/GMCKsolid_PONsamplelist.txt
), where fastq files need to be grouped under one case_id. Is it possible to do with cg
now?
Solution from @karlnyr
cg commands to link multiple fastq files to single case-id
[0|0|0] 10d [hiseq.clinical@hasta:~] [P_main] 19s 2 $ cg add family --priority standard -p OMIM-AUTO -a balsamic -dd scout cust000 panel_of_normal_20211222
givingcobra: new case added
[0|0|0] θ60° 10d [hiseq.clinical@hasta:~] [P_main] 22s $ for sample in `awk '$1 !~ /sample_id/ {print $1}' /home/proj/long-term-stage/cancer/PON_analysis_runs_APJ/GMCKsolid_normalblood_custID_VW.txt | uniq`; do cg add relationship -s unknown givingcobra $sample; done
cg workflow balsamic link givingcobra
Additional features will not be implemented to address this
As a busy person, I want to drink coffee so that I wake up. And sometimes I want to link multiple Fastq files in an analysis directory to run BALSAMIC for generating a background data
Problem:
Expected outcome / suggested solution:
For a mutlie-tumor or multi-non-tumor case, I'd like to only link FastQ file and only create config. NO analysis should start.
Questions regarding issue:
Q: Does it need to run analysis? A: No Q: Do these cases need to be compressed? A: Maybe not. They are not validation cases, but they should be accessible to generate pool of normals easier Q: Will these FastQ files be internally sequenced or external? A: Internal for now, but it would be neat to be able to add externally sequenced samples Q: Will customer order these or will this case be created by us? A: It will be created by us. Q: Is this high priority? A: Probably. One of the customers is waiting for using a pool of normals in analysis, and we just received the list of samples that we can use to generate a mega case.
What needs to be done on the planning meeting:
See https://github.com/Clinical-Genomics/development/blob/master/git/issue-reports.md for more!