Molmed / checkQC

CheckQC inspects the content of an Illumina runfolder and determines if it passes a set of quality criteria
http://checkqc.readthedocs.io/
GNU General Public License v3.0
25 stars 16 forks source link

percent_phix on undetermined_percentage_handler.py #110

Open sgaleraalq opened 1 year ago

sgaleraalq commented 1 year ago

Good morning.

I am writing this issue because I was trying to implement CheckQC on an preliminary analysis in our company. I experienced that the tool is breaking when trying to find "percent_phix" in our Run information since there are no fields that correlate to this specific matter. When I comment the parts in which "percent_phix" takes part on, the tools finishes correctly.

However, the MultiQC analysis that we run afterwards requires "percent_phix" to be on the output information to correctly assign CheckQC results on their final plots.

I think that the reason why we do not get this "percent_phix" field is because we demultiplex our samples without splitting them by lane (but it is just my guess, I am not completely sure). Could it be possible to implement a new version which it does not really require that field?

I attach in this issue the parts where I changed your original code and my data so you can work on if you want to.

  def compute_threshold(value):
      return value # + mean_phix_per_lane[lane_nbr]

  def create_data_dict(value):
      return {"lane": lane_nbr,
              "percentage_undetermined": percentage_undetermined,
              "threshold": value,
              "computed_threshold": compute_threshold(value)}
              # "phix_on_lane": mean_phix_per_lane[lane_nbr]}

checkqc_trial.zip

Command I run:

checkqc --downgrade-errors UndeterminedPercentageHandler .

I also change the config.yaml to:

parser_configurations:
  StatsJsonParser:
    # Path to where the bcl2fastq output (i.e. fastq files, etc) is located relative to
    # the runfolder
    bcl2fastq_output_path: Reports/legacy
  SamplesheetParser:
    samplesheet_name: Reports/SampleSheet.csv
Aratz commented 1 year ago

Hi, thanks for reporting this!

We have started discussing implementing a fix for this, I'll come back to you when I know more.

Meanwhile, does the runfolder you're running from also include the InterOp files? From what I could dig up this is where the information about the phiX percentage is located (specifically I suspect it's in TileMetricsOut.bin but the other files might be needed too). If that still does not work, would it be possible to upload these files here?

I've tried to copy an Interop folder from a test runfolder we have into your test data and I could at least get passed the phiX error message.