sc_seq pipeline fails due to missing process/function keySet()

syellamilli commented 8 months ago

I was trying to run the sc_seq processing pipeline on a new batch of AUTOIPI data and received the following error:

N E X T F L O W  ~  version 22.10.4
Launching `pipeline_pre_fmx_qc.nf` [fervent_hopper] DSL2 - revision: 4ab27ea59f
Missing process or function keySet()

 -- Check script 'pipeline_pre_fmx_qc.nf' at line: 56 or see '.nextflow.log' file for more details

The pipeline was run using the following command:

sbatch -x c4-n20 run.sh /krummellab/data1/DSCoLab/AUTOIPI/nextflow/AIP1_12_22_KA12_13_DB42_44_45.json pre_fmx_qc

and the repository was up to date with main at the time of writing this (local version can be found at /krummellab/data1/syellamilli/software/data_processing_pipelines/)

JSON Contents:

{ "project_dir" : "/krummellab/data1/immunox/AUTOIPI",
  "settings" : {
    "add_tcr" : false,
    "add_bcr" : false,
    "skip_cellranger": false,
    "merge_for_demux" : true,
    "merge_demux_dir" : "/krummellab/data1/syellamilli/freemuxlet_data/",
    "demux_method" : "freemuxlet",
    "run_doubletfinder" : true,
    "mincell" : 3,
    "minfeature" : 100,
    "default_qc_cuts_file": "default_qc_cuts.csv",
    "randomseed" : 21212,
    "remove_demux_DBL": true,
    "remove_all_DBL": true
  },
  "pools" : [
    {
      "KA12" : {
      "nsamples" : "15",
      "libraries": {
        "AIP1-POOL-KA12-SCG1": {
          "ncells_loaded": 60000,
          "data_types": []
        },
        "AIP1-POOL-KA12-SCG2" : {
          "ncells_loaded": 60000,
          "data_types": []
        },
        "AIP1-POOL-KA12-SCG3" : {
          "ncells_loaded": 60000,
          "data_types": []
        },
        "AIP1-POOL-KA12-SCG4" : {
          "ncells_loaded": 60000,
          "data_types": []
        }
      }
    },
    "KA13" : {
      "nsamples" : "10",
      "libraries": {
        "AIP1-POOL-KA13-SCG1": {
          "ncells_loaded": 60000,
          "data_types": []
        },
        "AIP1-POOL-KA13-SCG2" : {
          "ncells_loaded": 60000,
          "data_types": []
        },
        "AIP1-POOL-KA13-SCG3" : {
          "ncells_loaded": 60000,
          "data_types": []
        },
      }
    },
    "DB42" : {
      "nsamples" : "4",
      "libraries": {
        "AIP1-POOL-DB42-SCG1": {
          "ncells_loaded": 60000,
          "data_types": ["CITE"]
        }
      }
    },
    "DB44" : {
      "nsamples" : "10",
      "libraries": {
        "AIP1-POOL-DB44-SCG1": {
          "ncells_loaded": 64680,
          "data_types": ["CITE"]
        }
      }
    },
    "DB45" : {
      "nsamples" : "10",
      "libraries": {
        "AIP1-POOL-DB45-SCG1": {
          "ncells_loaded": 68680,
          "data_types": ["CITE"]
        }
      }
    }
  }
  ]
}

erflynn commented 8 months ago

I wonder if this is a pipeline version issue -- which branch are you running?

erflynn commented 8 months ago

the config structure has been updated in main so this is both a compatibility issue & a branch issue. The add_vdj_v2 branch fixes this but also introduces additional features

amadeovezz commented 8 months ago

so this is just a json incompatibility issue. The fmx pipelines/json still havent been upgraded to the new json format.

From the README.md:

fmx_param_1.json: Similar to param_1.json except its format is intended for the pre_fmx and post_fmx pipelines.

The issue right now is: "pools" is a list, and should be dictionary.

If you take a look at fmx_param_1.json you can see an example of the desired format :)

UCSF-DSCOLAB / data_processing_pipelines

sc_seq pipeline fails due to missing process/function keySet() #66