Molmed / checkQC

CheckQC inspects the content of an Illumina runfolder and determines if it passes a set of quality criteria
http://checkqc.readthedocs.io/
GNU General Public License v3.0
25 stars 16 forks source link

Error when using lower bound in read length interval #53

Closed matrulda closed 6 years ago

matrulda commented 6 years ago

I found a bug when processing a hiseq2500_rapidhighoutput_v4 with read length 50.

ERROR:

INFO     ------------------------
INFO     Starting checkQC (1.1.2)
INFO     ------------------------
INFO     Runfolder is: /data/biotank7/runfolders/180124_D00457_0238_BCB2B1ANXX
INFO     No config file specified, using default config from /opt/checkqc/lib/python3.6/site-packages/checkQC/default_config/config.yaml.
ERROR    Could not find a config entry for instrument 'hiseq2500_rapidhighoutput_v4' with read length '50'. Please check the provided config file 
Traceback (most recent call last):
  File "/opt/checkqc/bin/checkqc", line 11, in <module>
    sys.exit(start())
  File "/opt/checkqc/lib/python3.6/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/opt/checkqc/lib/python3.6/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/opt/checkqc/lib/python3.6/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/checkqc/lib/python3.6/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/opt/checkqc/lib/python3.6/site-packages/checkQC/app.py", line 36, in start
    app.run()
  File "/opt/checkqc/lib/python3.6/site-packages/checkQC/app.py", line 66, in run
    reports = self.configure_and_run()
  File "/opt/checkqc/lib/python3.6/site-packages/checkQC/app.py", line 55, in configure_and_run
    handler_config = config.get_handler_config(instrument_and_reagent_version, read_length)
  File "/opt/checkqc/lib/python3.6/site-packages/checkQC/config.py", line 90, in get_handler_config
    raise e
  File "/opt/checkqc/lib/python3.6/site-packages/checkQC/config.py", line 82, in get_handler_config
    handler_config = self._get_matching_handler(instrument_and_reagent_type, read_length)
  File "/opt/checkqc/lib/python3.6/site-packages/checkQC/config.py", line 64, in _get_matching_handler
    raise KeyError
KeyError

Config:

[...]
hiseq2500_rapidhighoutput_v4:
  50-70:
    handlers:
      - name: ClusterPFHandler
        warning: 180 # Millons of clusters
        error: unknown
      - name: Q30Handler
        warning: 80 # Give percentage for reads greater than Q30
        error: unknown # Give percentage for reads greater than Q30
      - name: ErrorRateHandler
        allow_missing_error_rate: False
        warning: 1.5
        error: unknown
      - name: ReadsPerSampleHandler
        warning: 90 # 50 % of threshold for clusters pass filter
        error: unknown
  100-110:
[...]

I think the problem is that we are defining the read length as number of cycles - 1: https://github.com/Molmed/checkQC/blob/8911c2844b817bd85a9dd4c7d788b8d835ff8dd3/checkQC/run_type_recognizer.py#L221

and then doing an exclusive check against the lower bound: https://github.com/Molmed/checkQC/blob/8911c2844b817bd85a9dd4c7d788b8d835ff8dd3/checkQC/config.py#L94

So in my case: 51 cycles were run for read1, it is adjusted to read length 50. Then we check if 50 < 50, which is not true.

This could be fixed by not defining read length as number of cycles minus one or by doing an inclusive check against the lower bound. What do you think is the best, @johandahlberg @monikaBrandt ?

monikaBrandt commented 6 years ago

@MatildaAslin I suggest that we keep the exclusive check against the lower bound, but we change the interval to correspond to the number of cycles instead of read length. This mean that we will do a comparison against the value specified in runInfo.xml without doing any "corrections/calculations". Using the value specified in runInfo.xml, without doing modifications first, might help other users to understand our application better.

johandahlberg commented 6 years ago

I think @monikaBrandts suggestion is a good one. Does either of you want to have a look at fixing this? And perhaps also clarify in the docs that it is number of cycles that we use to define the criteria?

johandahlberg commented 6 years ago

I also think it would be great to catch the KeyError that turns up here and make it in to a more specific and descriptive exception to help the user understand what is going on.

matrulda commented 6 years ago

@monikaBrandt @johandahlberg Great suggestions! Stay tuned for a PR.