Molmed / checkQC

CheckQC inspects the content of an Illumina runfolder and determines if it passes a set of quality criteria
http://checkqc.readthedocs.io/
GNU General Public License v3.0
25 stars 16 forks source link

Supporting NextSeq and MiniSeq #65

Open avilella opened 6 years ago

avilella commented 6 years ago

Hi, what would be needed to support NextSeq and MiniSeq instruments? Anything I can provide?

johandahlberg commented 6 years ago

Hi!

Adding new instruments should be relatively simple. What is needed is:

  1. to implement new classes for the instruments here https://github.com/Molmed/checkQC/blob/ccade4f13a191b8480ccea75ba65dbeba0263aae/checkQC/run_type_recognizer.py#L61 for each of the new instruments
  2. add the correct instrument identifier prefix here: https://github.com/Molmed/checkQC/blob/ccade4f13a191b8480ccea75ba65dbeba0263aae/checkQC/run_type_recognizer.py#L187
  3. implement reasonable default qc criteria in the config file

I'd be happy to add support for them. What I would need from you, is if you could tell me what prefix the instruments uses, and what you think would be reasonable default qc criteria. And, since I don't have access to data from these instruments it would be great if you could run some beta testing making sure that everything seems to work (or if possible send me some data that I could try it out on).

apeltzer commented 6 years ago

Hm, I have a couple of NextSeq 500 runs here that I could get hands on.

johandahlberg commented 6 years ago

That's great, @apeltzer. I found some information which indicated that the NextSeq instruments have serial numbers that start with SN, is that correct? If I get a pre-release out, would you be willing to beta test it?

apeltzer commented 6 years ago

I guess I could do that yes - regarding the serial number, I will check. Could however very well be the case yes.

apeltzer commented 6 years ago

Thats a normal FastQ file out here:

@NS500559:25:HJHMNBGXX:1:11101:4226:1073 1:N:0:TTACTTCT+CTAACTTA
GATCTNGGTCTGGTTTCATCCGCGGCATTTTGCCACCCTGACCGGAGTGGTCTTTGCCGTCGGTTATCTGGGAAA
+
AAAAA#EEEEEEEEEEEEEEAEEEEEEE/A<EAE/EEAEEEEAEEEEEAEAA/E/EEEEEEEEEEAAA6EEEEE/
@NS500559:25:HJHMNBGXX:1:11101:18957:1076 1:N:0:TTACTTCT+CTAACTTA
johandahlberg commented 6 years ago

Thanks! Do you have any idea about what values can be used to differentiate between the High and Mid-output modes of the instrument? I'm guessing that information would be available somewhere like the runParameters.xml, but since I don't have a runfolder I can't check it.

apeltzer commented 6 years ago

I'm linking in Stephen here who should have access to such runParameters.xml - could you maybe make some available to Johan for that purpose? one for High and one for Mid Output modes on a NextSeq 500?

@sc13-bioinf

cbrueffer commented 4 years ago

Here's some information based on our NextSeq 550 DX. The DX instrument version is certified for diagnostic use, so has a different instrument ID, e.g. NDX550213 in our case.

I don't have access to medium output kit runs, but the high output ones have this in RunParameters.xml under the RunParameters node: <Chemistry>NextSeq High</Chemistry>

johandahlberg commented 4 years ago

Sorry for the very later reply @cbrueffer , and thank you for the information. While we don't currently have the resources to implement this, we would very much welcome a PR to fix it.

There is a stale PR here https://github.com/Molmed/checkQC/pull/69 where I stared work on this, that basically should take you through most of the changes that needs to be made.

cbrueffer commented 4 years ago

No worries Johan; I haven't had time to look into this further yet (hopefully soon), but for now I can at least add some more information:

The mid output kit is marked as <Chemistry>NextSeq Mid</Chemistry> in RunParameters.xml.

ID strings for the regular NextSeq start with @NS and @NB, according to https://github.com/OpenGene/fastp/blob/e30ec117f2dd45148942064128f0c9b3a48876e3/src/evaluator.cpp#L25

matrulda commented 4 years ago

Nice! I can add that Illumina have NextSeq and MiniSeq data in their demo data collection (requires registration to access), could perhaps serve as testdata.

maleasy commented 3 years ago

I was wondering if there is any progress in supporting NextSeq?