elsasserlab / minute

MINUTE-ChIP data analysis workflow
https://minute.readthedocs.io
MIT License
2 stars 0 forks source link

minute init with a simplified barcodes.tsv specification #168

Closed cnluzon closed 1 year ago

cnluzon commented 1 year ago

The aim of this PR is to make the generation of the libraries.tsv and groups.tsv files easier for the standard use of minute which tends to be always the same for all the FASTQ files provided in a single run. Otherwise, the minute run can be split on different runs.

I added a couple parameters to minute init to simplify the table generation process. The user needs to fill out only a single barcodes table and speficy which FASTQ file pair is the input: --barcodes and --input.

The idea is that the the barcode configuration provided is applied just the same to every pair of FASTQ files that exists in the fastq directory and normalized to the matching input sample specified by --input. This will always normalize to the first line in the --barcodes file provided, and default to pooled samples. This step only generates the libraries.tsv and groups.tsv with this information, the user is then free to further edit this file before running minute run.

I think we developed the software on a more flexible use case, but in practice that almost never happens, and it overcomplicates the setup (and also the workflow itself) of a way simpler use case that is almost always the one we run.

Edit: To highlight that I added a second minute run on the test.sh script. I am aware that this might be undesirable because it will eat up more testing time, so it is up to discussion whether we keep that or just pick one of the use cases for testing on the CI script. If we pick only one, I would go for this last use case because it is more frequent than the custom version with the downloaded already demultiplexed file.

cnluzon commented 1 year ago

Thanks @marcelm! I will merge this branch if there are no further comments so you can update the ongoing PRs!