How should we handle Patients with multiple Normal samples?

shuntsman-ucsf commented 6 years ago

I have a patient with 2 normal fastq. The recal step fails due to multiple files listed as "Normal" in the patient_ID_conversions table

- SAMPLES=SC299898_CGCTGATC_L002 SC299898_CGCTGATC_L006 SC299982_AACAACCA_L001 SC299982_AACAACCA_L002
- NORMAL=SC299898_CGCTGATC_L002
SC299898_CGCTGATC_L006
...
...
grep: the -P option only supports a single pattern
ERROR: NORMAL 'SC299898_CGCTGATC_L002
SC299898_CGCTGATC_L006' is not part of SAMPLES: SC299898_CGCTGATC_L002 SC299898_CGCTGATC_L006 SC299982_AACAACCA_L001 SC299982_AACAACCA_L002

What is the correct way to handle it?

relabel the 2nd normal as "Normal2" or something like that, then re-run through recal (how does this affect downstream?)
merge Normal1 and Normal2 prior to recal
discard the 2nd fastq from analysis
should recal be modified to handle this?

(not urgent since I have plenty of others to recal, but will want to run eventually)

ivan108 commented 6 years ago

Yes, you need to go through merging procedure, only 1 normal is allowed...

shuntsman-ucsf commented 6 years ago

OK, so correct procedure for multiple Normals is:

Trim
Align
Merge Normals, adjust patientTable
Recal
Merge Tumors, adjust patientTable (only if multiple tumors)
RecalPass2 (only if multiple tumors)_
Pindel, Mutect
PostMutect

Or should each Normal be run through Recal separately, and then merged like the Tumors and run again separately through Recal_Pass2?

I think we will hold off on these for now or just use 1 normal.

ivan108 commented 6 years ago

Not quite:

Trim
Align
Recal
Merge all groups, Normals and/or Tumors
Change patient table, replace duplicated sampleIDs with merged IDs
Recal_pass2
Pindel,Mutect
PostMutect

shuntsman-ucsf commented 5 years ago

The recal step only recognizes 1 Normal from the patient_ID_conversions.tsv or it crashes.

example output from run with 2 samples labeled as "Normal" for a patient given in first post

You had given an answer, but it doesn't explain how to run recal prior to merging. I would like clarification:

How should additional "Normal"s be labeled in the conversions.tsv prior to merging in order to run recal on that patient?
Does the label (whether Normal or Primary, Recurrence1, etc...) affect how it is handled in the recal step (i.e. Can I just label each as anything prior to merging to get the recal step to run)?
Or do I need to make separate conversions.tsv files for every patient with multiple normals?

ivan108 commented 5 years ago

Yes, you are right, the LG3 pipeline was designed to work with exactly one Normal per patient.

The work around involves merging Normals, and it requires to change patient "conversion" file (PCF). Here are step by step directions:

On the first pass your PCF1 should contain all normal samples and all other samples, but only one normal is marked as "Normal", the other normal(s) can be labeled anything but "Normal", e.g. "Primary" or "Recurrence1". At this step those labels are not used for anything but identifying a single normal sample.
after PCF1 is setup, you run Trim, Align and Recal as usual.
next run Merge (or Merge_QC) on normal samples, and give the merged sample a new ID. To run Merge you need to provide variable SAMPLES and SAMPLE, in addition to the usual variables, e.g.: SAMPLES="SC299898_CGCTGATC_L002 SC299898_CGCTGATC_L006" SAMPLE=SC299898_normal_merged
next update PCF1, by replacing all normal samples of the patient with a single normal merged sample, labeled as "Normal" (=PCF2).
next run Recal_pass2
run Pindel and MutDet as usual
run PostMut as usual

Hope it makes sense...

HenrikBengtsson commented 5 years ago

@shuntsman-ucsf, feel free to propose improvements/how you think this can be best handled. You're currently the only one that is using the pipeline this way, so you're the one best positioned to provide such suggestions.

shuntsman-ucsf commented 5 years ago

I would recommend the pipeline is changed to accept either multiple "Normal" labels in the Recal step or a recommended "Normal_rep" or something "official" the user will put in the original PCF1. And then keep the error message in recal2 and beyond to explicitly only allow one "Normal" (and specify in a readme somewhere).

However, after merging some sets where there are 2 normals, I am getting an error in Recal2 where it does not seem to find the merged normal file specified by PCF2. I will try a few more tests, and make a separate issue if needed.

UCSF-Costello-Lab / LG3_Pipeline

How should we handle Patients with multiple Normal samples? #93