UCSF-Costello-Lab / LG3_Pipeline

The original LG3 pipeline
https://github.com/UCSF-Costello-Lab/LG3_Pipeline
0 stars 0 forks source link

How should we handle Patients with multiple Normal samples? #93

Open shuntsman-ucsf opened 6 years ago

shuntsman-ucsf commented 6 years ago

I have a patient with 2 normal fastq. The recal step fails due to multiple files listed as "Normal" in the patient_ID_conversions table

- SAMPLES=SC299898_CGCTGATC_L002 SC299898_CGCTGATC_L006 SC299982_AACAACCA_L001 SC299982_AACAACCA_L002
- NORMAL=SC299898_CGCTGATC_L002
SC299898_CGCTGATC_L006
...
...
grep: the -P option only supports a single pattern
ERROR: NORMAL 'SC299898_CGCTGATC_L002
SC299898_CGCTGATC_L006' is not part of SAMPLES: SC299898_CGCTGATC_L002 SC299898_CGCTGATC_L006 SC299982_AACAACCA_L001 SC299982_AACAACCA_L002

What is the correct way to handle it?

(not urgent since I have plenty of others to recal, but will want to run eventually)

ivan108 commented 6 years ago

Yes, you need to go through merging procedure, only 1 normal is allowed...

shuntsman-ucsf commented 6 years ago

OK, so correct procedure for multiple Normals is:

Or should each Normal be run through Recal separately, and then merged like the Tumors and run again separately through Recal_Pass2?

I think we will hold off on these for now or just use 1 normal.

ivan108 commented 6 years ago

Not quite:

  1. Trim
  2. Align
  3. Recal
  4. Merge all groups, Normals and/or Tumors
  5. Change patient table, replace duplicated sampleIDs with merged IDs
  6. Recal_pass2
  7. Pindel,Mutect
  8. PostMutect
shuntsman-ucsf commented 5 years ago

The recal step only recognizes 1 Normal from the patient_ID_conversions.tsv or it crashes.

example output from run with 2 samples labeled as "Normal" for a patient given in first post

You had given an answer, but it doesn't explain how to run recal prior to merging. I would like clarification:

ivan108 commented 5 years ago

Yes, you are right, the LG3 pipeline was designed to work with exactly one Normal per patient.

The work around involves merging Normals, and it requires to change patient "conversion" file (PCF). Here are step by step directions:

Hope it makes sense...

HenrikBengtsson commented 5 years ago

@shuntsman-ucsf, feel free to propose improvements/how you think this can be best handled. You're currently the only one that is using the pipeline this way, so you're the one best positioned to provide such suggestions.

shuntsman-ucsf commented 5 years ago

I would recommend the pipeline is changed to accept either multiple "Normal" labels in the Recal step or a recommended "Normal_rep" or something "official" the user will put in the original PCF1. And then keep the error message in recal2 and beyond to explicitly only allow one "Normal" (and specify in a readme somewhere).

However, after merging some sets where there are 2 normals, I am getting an error in Recal2 where it does not seem to find the merged normal file specified by PCF2. I will try a few more tests, and make a separate issue if needed.