AlexsLemonade / alsf-scpca

Management and analysis tools for ALSF Single-cell Pediatric Cancer Atlas data.
BSD 3-Clause "New" or "Revised" License
0 stars 1 forks source link

Filter alevin (and other single cell workflows) to 10X data #42

Closed jashapiro closed 3 years ago

jashapiro commented 4 years ago

Currently, the single cell quantification workflows require lists of runIDs to process. This is good for testing, but if and when we want to run the full data sets, we will want to have a convenient way to just run everything that is 10X data. The metadata file already lists technology as a column, so we can and should filter on that to restrict the runs to single cell data.

At the same time, we may well want to implement --run_ids All in a way similar to what I have done for the check_md5.nf workflow: https://github.com/AlexsLemonade/alsf-scpca/blob/726c8223c3369085ad3867510d0c281b5f280eb1/workflows/checks/check-md5.nf#L28-L32

Implementing both the technology filter and run all option might be as straightforward as:

  run_ids = params.run_ids?.tokenize(',') ?: []
  run_all = run_ids[0] == "All"
  ch_runs = Channel.fromPath(params.run_metafile)
    .splitCsv(header: true, sep: '\t')
    .filter{it.technology == "10Xv3"}
    .filter{run_all || (it.scpca_run_id in run_ids)}