Illumina / interop

C++ Library to parse Illumina InterOp files
http://illumina.github.io/interop/index.html
GNU General Public License v3.0
75 stars 26 forks source link

Access to channel names #318

Closed sybrohee closed 1 year ago

sybrohee commented 1 year ago

Hello,

No really a bug here, let's say an advice ... Since a few days, I have been really trying to find a way to obtain the channels names (green, blue, red, A,C,T,G) from the channel numerical value (0,1,2,3) (extraction metrics)

    extra['valid_to_load'] = create_valid_to_load(('Index', 'Tile', "Extraction"))
    run_metrics = read(runFolder, **extra)
    extraction_metric_set = run_metrics.extraction_metric_set()
    cc = extraction_metric_set.at(0).channel_count()

From there, knowing the instrument and the number of channels, I can infer whether the channel names are (ACGT) or (red green) or even (blue,green) but don't like the idea of hardcoding it the names of the channels in my function.

Hereafter, the function I use at the moment but I am not totally happy with them as my function getChannelTypes harcodes the name of the sequencer whereas I'd really like to rely completely on the interop library.

Thank you for your fantastic job with interop and for any help you could provide me with.

def getInstrumentType(runFolder, **extra):
    run_parameters = interop_run.parameters()
    run_parameters.read(runFolder)
    instrument_type = run_parameters.instrument_type()
    instrument_name = interop_run.to_string_instrument_type(instrument_type)
    return (instrument_name)

def getChannelTypes(instrument):
    # hardcoded channels as in channel.h
    channels = []
    if instrument in ['NovaSeq', 'NextSeq', 'MiniSeq', 'NextSeq1k2k']:
        channels = ['Red', 'Green']
    elif instrument in ['MiSeq']:
        channels = ['A', 'C', 'G', 'T']
    else:
        os.write(2, b"Error : Instrument "+ instrument + " not supported\n")
        exit(1)
    return (channels)

def parseExtractionMetricsOutBin(runFolder, **extra):
    ## This function is an adaptation of the indexing https://github.com/Illumina/interop/blob/master/src/ext/python/core.py
    ## The idea is to reimplement dumptext --metric=ExtractionMetric
    extra['valid_to_load'] = create_valid_to_load(('Index', 'Tile', "Extraction"))
    run_metrics = read(runFolder, **extra)
    if not isinstance(run_metrics, interop_metrics.run_metrics):
        raise ValueError("Expected interop.py_interop_run_metrics.run_metrics or str for `run_metrics`")
    instrument = getInstrumentType(runFolder)
    channels = getChannelTypes(instrument)
    extraction_metric_set = run_metrics.extraction_metric_set()
    column_names =  [("Lane", int), ("Tile", int),("Cycle", int), ("TimeStamp", int)]
    for channel in channels:
        column_names += [("MaxIntensity_" + channel, int)]
    for channel in channels:
        column_names += [("Focus_" + channel, float)]
    channel_count = extraction_metric_set.at(1).channel_count()
    table = np.zeros(extraction_metric_set.size(), dtype=column_names)
    k = 0
    for i in range(extraction_metric_set.size()):
        metric = extraction_metric_set.at(i)
        if channel_count == 2:
            table[k] = (metric.lane()
                        , metric.tile()
                        , metric.cycle()
                        , metric.date_time()
                        , metric.max_intensity(0)
                        , metric.max_intensity(1)
                        , metric.focus_score(0)
                        , metric.focus_score(1)
                        )
        elif channel_count == 4:
            table[k] = (metric.lane()
                        , metric.tile()
                        , metric.cycle()
                        , metric.date_time()
                        , metric.max_intensity(0)
                        , metric.max_intensity(1)
                        , metric.max_intensity(2)
                        , metric.max_intensity(3)                        
                        , metric.focus_score(0)
                        , metric.focus_score(1)
                        , metric.focus_score(2)
                        , metric.focus_score(3)                        
                        )            
        k += 1
    return table
ezralanglois commented 1 year ago

This should do it on all systems (assuming the RunInfo.xml and RunParameters.xml) are present.

run_metrics = read(runFolder, **extra)
print(run_metrics.run_info().channels())
sybrohee commented 1 year ago

So simple but also so complicated ... I'll have a look asap and tell you whether you are indeed my hero. Thanks a lot again.

sybrohee commented 1 year ago

The solution you proposed is indeed working. Thanks for everything!