biosails / pheniqs

Fast and accurate sequence demultiplexing
Other
26 stars 4 forks source link

demultiplexing by multiple barcode positions #38

Closed SimonasJocys closed 1 year ago

SimonasJocys commented 1 year ago

Hello!

Pheniqs is amazing piece of software and seems to be just the right tool for custom demultiplexing of Illumina runs. But I have trouble understanding configuration options and/or possibilities - especially regarding output.

I would like to demultiplex fastq file based on barcodes in multiple positions (combinatorial barcoding) and write output to separate files for each barcode combination. White it seems I can achieve this with modification of following configuration, I can't seem to find option to write output to separate files based on barcode combinations and not segments:

{
    "PL": "ILLUMINA",
    "cellular": [
        {
            "algorithm": "pamld",
            "base": "SPLiT-seq 96",
            "comment": "First round ",
            "confidence threshold": 0.99,
            "noise": 0.05,
            "transform": {
                "token": [
                    "0::8"
                ]
            }
        },
        {
            "algorithm": "pamld",
            "base": "SPLiT-seq 96",
            "comment": "Second round",
            "confidence threshold": 0.99,
            "noise": 0.05,
            "transform": {
                "token": [
                    "0:12:20"
                ]
            }
        },
    ],
    "import": [
        "splitseq_core_barcodes.json"
    ],
    "molecular": [
        {
            "transform": {
                "token": [
                    "0::"
                ]
            }
        }
    ],
    "template": {
        "transform": {
            "token": [
                "0::40"
            ]
        }
    }
}

I would appreciate any help regarding demultiplexing output configuration options or stategies.

Thanks a lot for your help, Simon

moonwatcher commented 1 year ago

Hi Simon

Thank you for using Pheniqs 😀

Pheniqs will not allow you to specify output directives on more than one decoder. That means you can't have it do exactly what you want in one step.

You can, however, achieve that in multiple steps. You can decode the first round of barcodes and break the output into multiple output files. Than further process the outputs in a second run.

Pheniqs is very fast making the two step process pretty efficient.

One of the reasons we picked json as the configuration file format is that it's easy to generate configuration files in python.

I think the fluidigm example does something like that.

The two step process is also more "statistically correct" since you can estimate the priors for each step from the data, giving you conditional, Bayesian, probabilities on a more specific dataset.

If you need a more concrete example, maybe you can provide a tiny example set and I can walk you through the configuration.

Lior.