alexyermanos / Platypus

R package for the analysis of single-cell immune repertoires
GNU General Public License v3.0
36 stars 16 forks source link

VDJ_call_MIXCR issue #37

Closed kbruton closed 1 year ago

kbruton commented 1 year ago

I've followed the PlatypusV3 vignette (2022-08-14) and have been able to generate a vgm (gex_vdj.int) with VDJ_GEX_matrix. With VDJ_call_MIXCR, I receive the error below. MiXCR was downloaded, unpacked, licensed, and I have specified the path to mixcr.jar file. It seems MiXCR is expecting a .tsv file and is instead receiving tempmixcrlc.out.vdjca. Any guidance you can provide would be much appreciated.

VDJ_mixcr_out <- VDJ_call_MIXCR(VDJ = gex_vdj.int[[1]], mixcr.directory = "~/Downloads/mixcr-4/", species = "mmu", platypus.version = "v3", operating.system = "Darwin", simplify = T)

Missing required option: '--preset ' Usage: java -jar mixcr.jar align --preset [--trimming-quality-threshold ] [--trimming-window-size ] [--write-all] [--tag-pattern-file

] [--tag-parse-unstranded] [--tag-max-budget ] [--read-id-as-cell-tag] [--read-buffer ] [--high-compression] [--not-aligned-R1 ] [--not-aligned-R2 ] [--not-parsed-R1 ] [--not-parsed-R2 ] [--species ] [--library ] [--dna] [--rna] [--floating-left-alignment-boundary []] [--rigid-left-alignment-boundary []] [--floating-right-alignment-boundary (|)] [--rigid-right-alignment-boundary [(|)]] [--tag-pattern ] [--keep-non-CDR3-alignments] [--drop-non-CDR3-alignments] [--limit-input ] [--assemble-clonotypes-by ] [--split-clones-by ]... [--dont-split-clones-by ]... [--assemble-contigs-by ] [--impute-germline-on-export] [--dont-impute-germline-on-export] [--prepend-export-clones-field [...]]... [--append-export-clones-field [...]]... [--prepend-export-alignments-field [...]]... [--append-export-alignments-field [...]]... [-O ]... [-M ]... [--report ] [--json-report ] [--threads ] [--force-overwrite] [--no-warnings] [--verbose] [--help] (file_R1.fastq[.gz] file_R2. fastq[.gz]|file_RN.(fastq[.gz]|fasta|bam|sam)) alignments.vdjca Builds alignments with V,D,J and C genes for input sequencing reads. (file_R1.fastq[.gz] file_R2.fastq[.gz]|file_RN.(fastq[.gz]|fasta|bam|sam)) Two fastq files for paired reads or one file for single read data. Use {{n}} if you want to concatenate files from multiple lanes, like: my_file_L{{n}}_R1.fastq.gz my_file_L{{n}}_R2.fastq.gz alignments.vdjca Path where to write output alignments -p, --preset Analysis preset. Sets all significant parameters of this and all downstream analysis steps. This is a required parameter. It is very important to carefully select the most appropriate preset for the data you analyse. --trimming-quality-threshold Read pre-processing: trimming quality threshold. Zero value can be used to skip trimming. Default value determined by the preset. --trimming-window-size Read pre-processing: trimming window size. Default value determined by the preset. --write-all Write alignment results for all input reads (even if alignment failed). Default value determined by the preset. --tag-pattern-file Read tag pattern from a file. Default tag pattern determined by the preset. --tag-parse-unstranded If paired-end input is used, determines whether to try all combinations of mate-pairs or only match reads to the corresponding pattern sections (i.e. first file to first section, etc...). Default value determined by the preset. --tag-max-budget Maximal bit budget, higher values allows more substitutions in small letters. Default value determined by the preset. --read-id-as-cell-tag Marks reads, coming from different files, but having the same positions in those files, as reads coming from the same cells. Main use-case is protocols with overlapped alpha-beta, gamma-delta or heavy-light cDNA molecules, where each side was sequenced by separate mate pairs in a paired-end sequencer. Use special expansion group CELLSPLIT instead of R index (i.e. "my_file_R{{CELLSPLIT:n}}. fastq.gz"). Default value determined by the preset. --read-buffer Size of buffer for FASTQ readers in bytes. Default: 4Mb --high-compression Use higher compression for output file, 10~25% slower, minus 30~50% of file size. --not-aligned-R1 Pipe not aligned R1 reads into separate file. --not-aligned-R2 Pipe not aligned R2 reads into separate file. --not-parsed-R1 Pipe not parsed R1 reads into separate file. --not-parsed-R2 Pipe not parsed R2 reads into separate file. -s, --species Species (organism). Possible values: `hsa` (or HomoSapiens), `mmu` (or MusMusculus), `rat`, `spalax`, `alpaca`, `lamaGlama`, `mulatta` (_Macaca Mulatta_), `fascicularis` (_Macaca Fascicularis_) or any species from IMGT ® library. -b, --library V/D/J/C gene library. By default, the `default` MiXCR reference library is used. One can also use external libraries --dna For DNA starting material. Setups V gene feature to align to `VGeneWithP` (full intron) and also instructs MiXCR to skip C gene alignment since it is too far from CDR3 in DNA data. --rna For RNA starting material; setups `VTranscriptWithP` (full exon) gene feature to align for V gene and `CExon1` for C gene. --floating-left-alignment-boundary [] Configures aligners to use semi-local alignment at reads 5'-end. Typically used with V gene single primer / multiplex protocols, or if there are non-trimmed adapter sequences at 5'-end. Optional may be specified to instruct MiXCR where the primer is located and strip V feature to align accordingly, resulting in a more precise alignments. --rigid-left-alignment-boundary [] Configures aligners to use global alignment at reads 5'-end. Typically used for 5'RACE with template switch oligo or a like protocols. Optional may be specified to instruct MiXCR how to strip V feature to align. --floating-right-alignment-boundary (|) Configures aligners to use semi-local alignment at reads 3'-end. Typically used with J or C gene single primer / multiplex protocols, or if there are non-trimmed adapter sequences at 3'-end. Requires either gene type (`J` for J primers / `C` for C primers) or to be specified. In latter case MiXCR will additionally strip feature to align accordingly. --rigid-right-alignment-boundary [(|)] Configures aligners to use global alignment at reads 3'-end. Typically used for J-C intron single primer / multiplex protocols. Optional (`J` for J primers / `C` for C primers) or may be specified to instruct MiXCR where how to strip J or C feature to align. --tag-pattern Specify tag pattern for barcoded data. --keep-non-CDR3-alignments Preserve alignments that do not cover CDR3 region or cover it only partially in the .vdjca file. --drop-non-CDR3-alignments Drop all alignments that do not cover CDR3 region or cover it only partially. --limit-input Maximal number of reads to process on `align` -O Overrides aligner parameters from the selected preset -M Overrides preset parameters -r, --report Report file (human readable version, see `-j / --json-report` for machine readable report). -j, --json-report JSON formatted report file. -t, --threads Processing threads -f, --force-overwrite Force overwrite of output file(s). -nw, --no-warnings Suppress all warning messages. --verbose Verbose warning messages. -h, --help Show this help message and exit. Params for assemble command: --assemble-clonotypes-by Specify gene features used to assemble clonotypes. One may specify any custom gene region (e.g. `FR3+CDR3`); target clonal sequence can even be disjoint. Note that `assemblingFeatures` must cover CDR3 --split-clones-by Clones with equal clonal sequence but different gene will not be merged. --dont-split-clones-by Clones with equal clonal sequence but different gene will be merged into single clone. Params for assembleContigs command: --assemble-contigs-by Selects the region of interest for the action. Clones will be separated if inconsistent nucleotides will be detected in the region, assembling procedure will be limited to the region, and only clonotypes that fully cover the region will be outputted, others will be filtered out. Params for export commands: --impute-germline-on-export Export nucleotide sequences using letters from germline (marked lowercase) for uncovered regions --dont-impute-germline-on-export Export nucleotide sequences only from covered region --prepend-export-clones-field [...] Add clones export column before other columns. First param is field name as it is in `exportClones` command, left params are params of the field --append-export-clones-field [...] Add clones export column after other columns. First param is field name as it is in `exportClones` command, left params are params of the field --prepend-export-alignments-field [...] Add clones export column before other columns. First param is field name as it is in `exportAlignments` command, left params are params of the field --append-export-alignments-field [...] Add clones export column after other columns. First param is field name as it is in `exportAlignments` command, left params are params of the field Require tsv file type, got tempmixcrhc.out.vdjca Missing required option: '--preset ' Usage: java -jar mixcr.jar align --preset [--trimming-quality-threshold ] [--trimming-window-size ] [--write-all] [--tag-pattern-file ] [--tag-parse-unstranded] [--tag-max-budget ] [--read-id-as-cell-tag] [--read-buffer ] [--high-compression] [--not-aligned-R1 ] [--not-aligned-R2 ] [--not-parsed-R1 ] [--not-parsed-R2 ] [--species ] [--library ] [--dna] [--rna] [--floating-left-alignment-boundary []] [--rigid-left-alignment-boundary []] [--floating-right-alignment-boundary (|)] [--rigid-right-alignment-boundary [(|)]] [--tag-pattern ] [--keep-non-CDR3-alignments] [--drop-non-CDR3-alignments] [--limit-input ] [--assemble-clonotypes-by ] [--split-clones-by ]... [--dont-split-clones-by ]... [--assemble-contigs-by ] [--impute-germline-on-export] [--dont-impute-germline-on-export] [--prepend-export-clones-field [...]]... [--append-export-clones-field [...]]... [--prepend-export-alignments-field [...]]... [--append-export-alignments-field [...]]... [-O ]... [-M ]... [--report ] [--json-report ] [--threads ] [--force-overwrite] [--no-warnings] [--verbose] [--help] (file_R1.fastq[.gz] file_R2. fastq[.gz]|file_RN.(fastq[.gz]|fasta|bam|sam)) alignments.vdjca Builds alignments with V,D,J and C genes for input sequencing reads. (file_R1.fastq[.gz] file_R2.fastq[.gz]|file_RN.(fastq[.gz]|fasta|bam|sam)) Two fastq files for paired reads or one file for single read data. Use {{n}} if you want to concatenate files from multiple lanes, like: my_file_L{{n}}_R1.fastq.gz my_file_L{{n}}_R2.fastq.gz alignments.vdjca Path where to write output alignments -p, --preset Analysis preset. Sets all significant parameters of this and all downstream analysis steps. This is a required parameter. It is very important to carefully select the most appropriate preset for the data you analyse. --trimming-quality-threshold Read pre-processing: trimming quality threshold. Zero value can be used to skip trimming. Default value determined by the preset. --trimming-window-size Read pre-processing: trimming window size. Default value determined by the preset. --write-all Write alignment results for all input reads (even if alignment failed). Default value determined by the preset. --tag-pattern-file Read tag pattern from a file. Default tag pattern determined by the preset. --tag-parse-unstranded If paired-end input is used, determines whether to try all combinations of mate-pairs or only match reads to the corresponding pattern sections (i.e. first file to first section, etc...). Default value determined by the preset. --tag-max-budget Maximal bit budget, higher values allows more substitutions in small letters. Default value determined by the preset. --read-id-as-cell-tag Marks reads, coming from different files, but having the same positions in those files, as reads coming from the same cells. Main use-case is protocols with overlapped alpha-beta, gamma-delta or heavy-light cDNA molecules, where each side was sequenced by separate mate pairs in a paired-end sequencer. Use special expansion group CELLSPLIT instead of R index (i.e. "my_file_R{{CELLSPLIT:n}}. fastq.gz"). Default value determined by the preset. --read-buffer Size of buffer for FASTQ readers in bytes. Default: 4Mb --high-compression Use higher compression for output file, 10~25% slower, minus 30~50% of file size. --not-aligned-R1 Pipe not aligned R1 reads into separate file. --not-aligned-R2 Pipe not aligned R2 reads into separate file. --not-parsed-R1 Pipe not parsed R1 reads into separate file. --not-parsed-R2 Pipe not parsed R2 reads into separate file. -s, --species Species (organism). Possible values: `hsa` (or HomoSapiens), `mmu` (or MusMusculus), `rat`, `spalax`, `alpaca`, `lamaGlama`, `mulatta` (_Macaca Mulatta_), `fascicularis` (_Macaca Fascicularis_) or any species from IMGT ® library. -b, --library V/D/J/C gene library. By default, the `default` MiXCR reference library is used. One can also use external libraries --dna For DNA starting material. Setups V gene feature to align to `VGeneWithP` (full intron) and also instructs MiXCR to skip C gene alignment since it is too far from CDR3 in DNA data. --rna For RNA starting material; setups `VTranscriptWithP` (full exon) gene feature to align for V gene and `CExon1` for C gene. --floating-left-alignment-boundary [] Configures aligners to use semi-local alignment at reads 5'-end. Typically used with V gene single primer / multiplex protocols, or if there are non-trimmed adapter sequences at 5'-end. Optional may be specified to instruct MiXCR where the primer is located and strip V feature to align accordingly, resulting in a more precise alignments. --rigid-left-alignment-boundary [] Configures aligners to use global alignment at reads 5'-end. Typically used for 5'RACE with template switch oligo or a like protocols. Optional may be specified to instruct MiXCR how to strip V feature to align. --floating-right-alignment-boundary (|) Configures aligners to use semi-local alignment at reads 3'-end. Typically used with J or C gene single primer / multiplex protocols, or if there are non-trimmed adapter sequences at 3'-end. Requires either gene type (`J` for J primers / `C` for C primers) or to be specified. In latter case MiXCR will additionally strip feature to align accordingly. --rigid-right-alignment-boundary [(|)] Configures aligners to use global alignment at reads 3'-end. Typically used for J-C intron single primer / multiplex protocols. Optional (`J` for J primers / `C` for C primers) or may be specified to instruct MiXCR where how to strip J or C feature to align. --tag-pattern Specify tag pattern for barcoded data. --keep-non-CDR3-alignments Preserve alignments that do not cover CDR3 region or cover it only partially in the .vdjca file. --drop-non-CDR3-alignments Drop all alignments that do not cover CDR3 region or cover it only partially. --limit-input Maximal number of reads to process on `align` -O Overrides aligner parameters from the selected preset -M Overrides preset parameters -r, --report Report file (human readable version, see `-j / --json-report` for machine readable report). -j, --json-report JSON formatted report file. -t, --threads Processing threads -f, --force-overwrite Force overwrite of output file(s). -nw, --no-warnings Suppress all warning messages. --verbose Verbose warning messages. -h, --help Show this help message and exit. Params for assemble command: --assemble-clonotypes-by Specify gene features used to assemble clonotypes. One may specify any custom gene region (e.g. `FR3+CDR3`); target clonal sequence can even be disjoint. Note that `assemblingFeatures` must cover CDR3 --split-clones-by Clones with equal clonal sequence but different gene will not be merged. --dont-split-clones-by Clones with equal clonal sequence but different gene will be merged into single clone. Params for assembleContigs command: --assemble-contigs-by Selects the region of interest for the action. Clones will be separated if inconsistent nucleotides will be detected in the region, assembling procedure will be limited to the region, and only clonotypes that fully cover the region will be outputted, others will be filtered out. Params for export commands: --impute-germline-on-export Export nucleotide sequences using letters from germline (marked lowercase) for uncovered regions --dont-impute-germline-on-export Export nucleotide sequences only from covered region --prepend-export-clones-field [...] Add clones export column before other columns. First param is field name as it is in `exportClones` command, left params are params of the field --append-export-clones-field [...] Add clones export column after other columns. First param is field name as it is in `exportClones` command, left params are params of the field --prepend-export-alignments-field [...] Add clones export column before other columns. First param is field name as it is in `exportAlignments` command, left params are params of the field --append-export-alignments-field [...] Add clones export column after other columns. First param is field name as it is in `exportAlignments` command, left params are params of the field Require tsv file type, got tempmixcrlc.out.vdjca Error in file(file, "rt") : cannot open the connection In addition: Warning message: In file(file, "rt") : cannot open file 'tempmixcrhc.out.txt': No such file or directory
vickreiner commented 1 year ago

Hi,

this seems to be a version issue with MiXCR. I will check the function with the newest release as soon as possible. Until then: have you tried with an older MiXCR release?

mizraelson commented 1 year ago

Hi,

this seems to be a version issue with MiXCR. I will check the function with the newest release as soon as possible. Until then: have you tried with an older MiXCR release?

Hi, I am one of the developers of MiXCR. Since MiXCR v4 mixcr requires a preset to be specified. There are several presets for different types of data and commercially available kits. You can read more on that here: https://mixcr.com/mixcr/reference/overview-built-in-presets/

These presets are required by both mixcr analyze command and mixcr align (which is an initial step of analyze).