TrinityCTAT / ctat-mutations

Mutation detection using GATK4 best practices and latest RNA editing filters resources. Works with both Hg38 and Hg19
https://github.com/TrinityCTAT/ctat-mutations
Other
71 stars 19 forks source link

gatk CreateSequenceDictionary command issue #86

Closed readline closed 3 years ago

readline commented 3 years ago

I'm prep the ctat mutation library with official singularity image. However an error happened:

`>singularity exec -e -B /gs9,/data,/home,/lscratch /path/to/ctat_mutations.v3.0.0.simg /usr/local/src/ctat-mutations/mutation_lib_prep/ctat-mutation-lib-integration.py --genome_lib_dir /path/to/GRCh38_gencode_v37_CTAT_lib_Mar012021.plug-n-play/ctat_genome_lib_bui ld_dir 2021-03-24 22:53:13,315: Generating /path/to/GRCh38_gencode_v37_CTAT_lib_Mar012021.plug-n-play/ctat_genome_lib_build_dir/ref_genome.dict Using GATK jar /usr/local/src/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar Running: java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /usr/local/src/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar CreateSequenceDictionary -R /path/to/GRCh38_gencode_v37CTAT lib_Mar012021.plug-n-play/ctat_genome_lib_build_dir/ref_genome.fa -O /path/to/GRCh38_gencode_v37_CTAT_lib_Mar012021.plug-n-play/ctat_genome_lib_build_dir/ref_genome.dict VALIDATION_STRINGENCY=LENIENT INFO 2021-03-25 02:53:15 CreateSequenceDictionary

** NOTE: Picard's command line syntax is changing.


** For more information, please see: ** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)


** The command line looks like this in the new syntax:


** CreateSequenceDictionary -R /path/to/GRCh38_gencode_v37_CTAT_lib_Mar012021.plug-n-play/ctat_genome_lib_build_dir/ref_genome.fa -O /path/to/GRCh38_gencode_v37_CTAT_lib_Mar012021.plug-n-play/ctat_genome_lib_build_dir/ref_genome.dict - VALIDATION_STRINGENCY LENIENT


ERROR: Invalid argument '-R'.

USAGE: CreateSequenceDictionary [options]

Documentation: http://broadinstitute.github.io/picard/command-line-overview.html#CreateSequenceDictionary

Creates a sequence dictionary for a reference sequence. This tool creates a sequence dictionary file (with ".dict" extension) from a reference sequence provided in FASTA format, which is required by many processing and analysis tools. The output file contains a header but no SAMRecords, and the header contains only sequence records.

The reference sequence can be gzipped (both .fasta and .fasta.gz are supported). Usage example:

java -jar picard.jar CreateSequenceDictionary \ R=reference.fasta \ O=reference.dict`

After I modified the source code of ctat-mutation-lib-integration.py to meet the format of "R=reference.fasta O=reference.dict", it will work.

joshua-gould commented 3 years ago

Thanks, I’ll take a look.

On Wed, Mar 24, 2021 at 11:18 PM Kai Yu @.***> wrote:

I'm prep the ctat mutation library with official singularity image. However an error happened:

`>singularity exec -e -B /gs9,/data,/home,/lscratch /path/to/ctat_mutations.v3.0.0.simg /usr/local/src/ctat-mutations/mutation_lib_prep/ctat-mutation-lib-integration.py --genome_lib_dir /path/to/GRCh38_gencode_v37_CTAT_lib_Mar012021.plug-n-play/ctat_genome_lib_bui ld_dir 2021-03-24 22:53:13,315: Generating /path/to/GRCh38_gencode_v37_CTAT_lib_Mar012021.plug-n-play/ctat_genome_lib_build_dir/ref_genome.dict Using GATK jar /usr/local/src/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar Running: java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /usr/local/src/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar CreateSequenceDictionary -R /path/to/GRCh38_gencode_v37CTAT lib_Mar012021.plug-n-play/ctat_genome_lib_build_dir/ref_genome.fa -O /path/to/GRCh38_gencode_v37_CTAT_lib_Mar012021.plug-n-play/ctat_genome_lib_build_dir/ref_genome.dict VALIDATION_STRINGENCY=LENIENT INFO 2021-03-25 02:53:15 CreateSequenceDictionary

** NOTE: Picard's command line syntax is changing.

** For more information, please see:


https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)

** The command line looks like this in the new syntax:

** CreateSequenceDictionary -R /path/to/GRCh38_gencode_v37_CTAT_lib_Mar012021.plug-n-play/ctat_genome_lib_build_dir/ref_genome.fa -O /path/to/GRCh38_gencode_v37_CTAT_lib_Mar012021.plug-n-play/ctat_genome_lib_build_dir/ref_genome.dict

VALIDATION_STRINGENCY LENIENT

ERROR: Invalid argument '-R'.

USAGE: CreateSequenceDictionary [options]

Documentation: http://broadinstitute.github.io/picard/command-line-overview.html#CreateSequenceDictionary

Creates a sequence dictionary for a reference sequence. This tool creates a sequence dictionary file (with ".dict" extension) from a reference sequence provided in FASTA format, which is required by many processing and analysis tools. The output file contains a header but no SAMRecords, and the header contains only sequence records.

The reference sequence can be gzipped (both .fasta and .fasta.gz are supported). Usage example:

java -jar picard.jar CreateSequenceDictionary \ R=reference.fasta \ O=reference.dict`

After I modified the source code of ctat-mutation-lib-integration.py to meet the format of "R=reference.fasta O=reference.dict", it will work.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/NCIP/ctat-mutations/issues/86, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABH6TH6HBKOALCKTV7DUGG3TFKTQLANCNFSM4ZYQGFUA .

joshua-gould commented 3 years ago

I updated the singularity image. Please re-download. Thanks for reporting.

ConcettaDe4 commented 3 years ago

Hi! I am building the "mutation lib integration utility" using the singularity image ctat_mutations.v3.0.0.simg with the command:

singularity exec -e -B /path/to/your/ctat_genome_lib_build_dir \
      ctat-mutations.simg \
      /usr/local/src/ctat-mutations/mutation_lib_prep/ctat-mutation-lib-integration.py \
      --genome_lib_dir /path/to/your/ctat_genome_lib_build_dir

Unfortunately I got the error ERROR: Invalid argument '-R'.

Do you have any suggestion to fix the problem? Thank you.

Concetta

brianjohnhaas commented 3 years ago

Hi,

Is there a full error message that you can post? I'm wondering if there's more info that might help guide us.

thx,

~brian

On Thu, May 13, 2021 at 11:59 AM ConcettaDe4 @.***> wrote:

Hi! I am building the "mutation lib integration utility" using the singularity image ctat_mutations.v3.0.0.simg with the command:

singularity exec -e -B /path/to/your/ctat_genome_lib_build_dir \ ctat-mutations.simg \ /usr/local/src/ctat-mutations/mutation_lib_prep/ctat-mutation-lib-integration.py \ --genome_lib_dir /path/to/your/ctat_genome_lib_build_dir

Unfortunately I got the error ERROR: Invalid argument '-R'.

Do you have any suggestion to fix the problem? Thank you.

Concetta

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/NCIP/ctat-mutations/issues/86#issuecomment-840658213, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZRKX6U4WOJFFC3ZLFTFQLTNPZO7ANCNFSM4ZYQGFUA .

--

Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas

ConcettaDe4 commented 3 years ago

Here the error message:

INFO:    Converting SIF file to temporary sandbox...
2021-05-13 15:28:14,332: Generating /home/stefania/RNAseq_project/test_ctat_mutations/GRCh38_gencode_v22_CTAT_lib_Mar012021/ctat_genome_lib_build_dir/ref_genome.dict
Using GATK jar /usr/local/src/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /usr/local/src/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar CreateSequenceDictionary -R /home/stefania/RNAseq_project/test_ctat_mutations/GRCh38_gencode_v22_CTAT_lib_Mar012021/ctat_genome_lib_build_dir/ref_genome.fa -O /home/stefania/RNAseq_project/test_ctat_mutations/GRCh38_gencode_v22_CTAT_lib_Mar012021/ctat_genome_lib_build_dir/ref_genome.dict VALIDATION_STRINGENCY=LENIENT
INFO    2021-05-13 15:28:20     CreateSequenceDictionary

********** NOTE: Picard's command line syntax is changing.
**********
********** For more information, please see:
********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
**********
********** The command line looks like this in the new syntax:
**********
**********    CreateSequenceDictionary -R /home/stefania/RNAseq_project/test_ctat_mutations/GRCh38_gencode_v22_CTAT_lib_Mar012021/ctat_genome_lib_build_dir/ref_genome.fa -O /home/stefania/RNAseq_project/test_ctat_mutations/GRCh38_gencode_v22_CTAT_lib_Mar012021/ctat_genome_lib_build_dir/ref_genome.dict -VALIDATION_STRINGENCY LENIENT
**********

ERROR: Invalid argument '-R'.

USAGE: CreateSequenceDictionary [options]

Documentation: http://broadinstitute.github.io/picard/command-line-overview.html#CreateSequenceDictionary

Creates a sequence dictionary for a reference sequence.  This tool creates a sequence dictionary file (with ".dict"
extension) from a reference sequence provided in FASTA format, which is required by many processing and analysis tools.
The output file contains a header but no SAMRecords, and the header contains only sequence records.

The reference sequence can be gzipped (both .fasta and .fasta.gz are supported).
Usage example:

java -jar picard.jar CreateSequenceDictionary \
R=reference.fasta \
O=reference.dict

Version: 4.1.9.0

Options:

--help
-h                            Displays options specific to this tool.

--stdhelp
-H                            Displays options specific to this tool AND options common to all Picard command line
                              tools.

--version                     Displays program version.

OUTPUT=File
O=File                        Output SAM file containing only the sequence dictionary. By default it will use the base
                              name of the input reference with the .dict extension  Default value: null.

GENOME_ASSEMBLY=String
AS=String                     Put into AS field of sequence dictionary entry if supplied  Default value: null.

URI=String
UR=String                     Put into UR field of sequence dictionary entry.  If not supplied, input reference file is
                              used  Default value: null.

SPECIES=String
SP=String                     Put into SP field of sequence dictionary entry  Default value: null.

TRUNCATE_NAMES_AT_WHITESPACE=Boolean
                              Make sequence name the first word from the > line in the fasta file.  By default the
                              entire contents of the > line is used, excluding leading and trailing whitespace.  Default
                              value: true. This option can be set to 'null' to clear the default value. Possible values:
                              {true, false}

NUM_SEQUENCES=Integer         Stop after writing this many sequences.  For testing.  Default value: 2147483647. This
                              option can be set to 'null' to clear the default value.

ALT_NAMES=File
AN=File                       Optional file containing the alternative names for the contigs. Tools may use this
                              information to consider different contig notations as identical (e.g: 'chr1' and '1'). The
                              alternative names will be put into the appropriate @AN annotation for each contig. No
                              header. First column is the original name, the second column is an alternative name. One
                              contig may have more than one alternative name.  Default value: null.

REFERENCE=File
R=File                        Input reference fasta or fasta.gz  Required.

Tool returned:
1
Traceback (most recent call last):
  File "/usr/local/src/ctat-mutations/mutation_lib_prep/ctat-mutation-lib-integration.py", line 66, in <module>
    subprocess.check_call(cmd)
  File "/opt/conda/lib/python3.7/subprocess.py", line 347, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['gatk', 'CreateSequenceDictionary', '-R', '/home/stefania/RNAseq_project/test_ctat_mutations/GRCh38_gencode_v22_CTAT_lib_Mar012021/ctat_genome_lib_build_dir/ref_genome.fa', '-O', '/home/stefania/RNAseq_project/test_ctat_mutations/GRCh38_gencode_v22_CTAT_lib_Mar012021/ctat_genome_lib_build_dir/ref_genome.dict', 'VALIDATION_STRINGENCY=LENIENT']' returned non-zero exit status 4.
INFO:    Cleaning up image...
brianjohnhaas commented 3 years ago

I see. Give me a second and I'll update the simg file. More shortly.

On Thu, May 13, 2021 at 12:33 PM ConcettaDe4 @.***> wrote:

Here the error message:

INFO: Converting SIF file to temporary sandbox... 2021-05-13 15:28:14,332: Generating /home/stefania/RNAseq_project/test_ctat_mutations/GRCh38_gencode_v22_CTAT_lib_Mar012021/ctat_genome_lib_build_dir/ref_genome.dict Using GATK jar /usr/local/src/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar Running: java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /usr/local/src/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar CreateSequenceDictionary -R /home/stefania/RNAseq_project/test_ctat_mutations/GRCh38_gencode_v22_CTAT_lib_Mar012021/ctat_genome_lib_build_dir/ref_genome.fa -O /home/stefania/RNAseq_project/test_ctat_mutations/GRCh38_gencode_v22_CTAT_lib_Mar012021/ctat_genome_lib_build_dir/ref_genome.dict VALIDATION_STRINGENCY=LENIENT INFO 2021-05-13 15:28:20 CreateSequenceDictionary

** NOTE: Picard's command line syntax is changing.


** For more information, please see: ** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)


** The command line looks like this in the new syntax:


** CreateSequenceDictionary -R /home/stefania/RNAseq_project/test_ctat_mutations/GRCh38_gencode_v22_CTAT_lib_Mar012021/ctat_genome_lib_build_dir/ref_genome.fa -O /home/stefania/RNAseq_project/test_ctat_mutations/GRCh38_gencode_v22_CTAT_lib_Mar012021/ctat_genome_lib_build_dir/ref_genome.dict -VALIDATION_STRINGENCY LENIENT


ERROR: Invalid argument '-R'.

USAGE: CreateSequenceDictionary [options]

Documentation: http://broadinstitute.github.io/picard/command-line-overview.html#CreateSequenceDictionary

Creates a sequence dictionary for a reference sequence. This tool creates a sequence dictionary file (with ".dict" extension) from a reference sequence provided in FASTA format, which is required by many processing and analysis tools. The output file contains a header but no SAMRecords, and the header contains only sequence records.

The reference sequence can be gzipped (both .fasta and .fasta.gz are supported). Usage example:

java -jar picard.jar CreateSequenceDictionary \ R=reference.fasta \ O=reference.dict

Version: 4.1.9.0

Options:

--help -h Displays options specific to this tool.

--stdhelp -H Displays options specific to this tool AND options common to all Picard command line tools.

--version Displays program version.

OUTPUT=File O=File Output SAM file containing only the sequence dictionary. By default it will use the base name of the input reference with the .dict extension Default value: null.

GENOME_ASSEMBLY=String AS=String Put into AS field of sequence dictionary entry if supplied Default value: null.

URI=String UR=String Put into UR field of sequence dictionary entry. If not supplied, input reference file is used Default value: null.

SPECIES=String SP=String Put into SP field of sequence dictionary entry Default value: null.

TRUNCATE_NAMES_AT_WHITESPACE=Boolean Make sequence name the first word from the > line in the fasta file. By default the entire contents of the > line is used, excluding leading and trailing whitespace. Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false}

NUM_SEQUENCES=Integer Stop after writing this many sequences. For testing. Default value: 2147483647. This option can be set to 'null' to clear the default value.

ALT_NAMES=File AN=File Optional file containing the alternative names for the contigs. Tools may use this information to consider different contig notations as identical (e.g: 'chr1' and '1'). The alternative names will be put into the appropriate @AN annotation for each contig. No header. First column is the original name, the second column is an alternative name. One contig may have more than one alternative name. Default value: null.

REFERENCE=File R=File Input reference fasta or fasta.gz Required.

Tool returned: 1 Traceback (most recent call last): File "/usr/local/src/ctat-mutations/mutation_lib_prep/ctat-mutation-lib-integration.py", line 66, in subprocess.check_call(cmd) File "/opt/conda/lib/python3.7/subprocess.py", line 347, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['gatk', 'CreateSequenceDictionary', '-R', '/home/stefania/RNAseq_project/test_ctat_mutations/GRCh38_gencode_v22_CTAT_lib_Mar012021/ctat_genome_lib_build_dir/ref_genome.fa', '-O', '/home/stefania/RNAseq_project/test_ctat_mutations/GRCh38_gencode_v22_CTAT_lib_Mar012021/ctat_genome_lib_build_dir/ref_genome.dict', 'VALIDATION_STRINGENCY=LENIENT']' returned non-zero exit status 4. INFO: Cleaning up image...

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/NCIP/ctat-mutations/issues/86#issuecomment-840678918, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZRKX7VMES7VK75UQC36CDTNP5M5ANCNFSM4ZYQGFUA .

--

Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas

brianjohnhaas commented 3 years ago

I'm in the process of rebuilding the docker and singularity images, so should have a fix for you in the next hour.

On Thu, May 13, 2021 at 1:09 PM Brian Haas @.***> wrote:

I see. Give me a second and I'll update the simg file. More shortly.

On Thu, May 13, 2021 at 12:33 PM ConcettaDe4 @.***> wrote:

Here the error message:

INFO: Converting SIF file to temporary sandbox... 2021-05-13 15:28:14,332: Generating /home/stefania/RNAseq_project/test_ctat_mutations/GRCh38_gencode_v22_CTAT_lib_Mar012021/ctat_genome_lib_build_dir/ref_genome.dict Using GATK jar /usr/local/src/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar Running: java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /usr/local/src/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar CreateSequenceDictionary -R /home/stefania/RNAseq_project/test_ctat_mutations/GRCh38_gencode_v22_CTAT_lib_Mar012021/ctat_genome_lib_build_dir/ref_genome.fa -O /home/stefania/RNAseq_project/test_ctat_mutations/GRCh38_gencode_v22_CTAT_lib_Mar012021/ctat_genome_lib_build_dir/ref_genome.dict VALIDATION_STRINGENCY=LENIENT INFO 2021-05-13 15:28:20 CreateSequenceDictionary

** NOTE: Picard's command line syntax is changing.


** For more information, please see: ** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)


** The command line looks like this in the new syntax:


** CreateSequenceDictionary -R /home/stefania/RNAseq_project/test_ctat_mutations/GRCh38_gencode_v22_CTAT_lib_Mar012021/ctat_genome_lib_build_dir/ref_genome.fa -O /home/stefania/RNAseq_project/test_ctat_mutations/GRCh38_gencode_v22_CTAT_lib_Mar012021/ctat_genome_lib_build_dir/ref_genome.dict -VALIDATION_STRINGENCY LENIENT


ERROR: Invalid argument '-R'.

USAGE: CreateSequenceDictionary [options]

Documentation: http://broadinstitute.github.io/picard/command-line-overview.html#CreateSequenceDictionary

Creates a sequence dictionary for a reference sequence. This tool creates a sequence dictionary file (with ".dict" extension) from a reference sequence provided in FASTA format, which is required by many processing and analysis tools. The output file contains a header but no SAMRecords, and the header contains only sequence records.

The reference sequence can be gzipped (both .fasta and .fasta.gz are supported). Usage example:

java -jar picard.jar CreateSequenceDictionary \ R=reference.fasta \ O=reference.dict

Version: 4.1.9.0

Options:

--help -h Displays options specific to this tool.

--stdhelp -H Displays options specific to this tool AND options common to all Picard command line tools.

--version Displays program version.

OUTPUT=File O=File Output SAM file containing only the sequence dictionary. By default it will use the base name of the input reference with the .dict extension Default value: null.

GENOME_ASSEMBLY=String AS=String Put into AS field of sequence dictionary entry if supplied Default value: null.

URI=String UR=String Put into UR field of sequence dictionary entry. If not supplied, input reference file is used Default value: null.

SPECIES=String SP=String Put into SP field of sequence dictionary entry Default value: null.

TRUNCATE_NAMES_AT_WHITESPACE=Boolean Make sequence name the first word from the > line in the fasta file. By default the entire contents of the > line is used, excluding leading and trailing whitespace. Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false}

NUM_SEQUENCES=Integer Stop after writing this many sequences. For testing. Default value: 2147483647. This option can be set to 'null' to clear the default value.

ALT_NAMES=File AN=File Optional file containing the alternative names for the contigs. Tools may use this information to consider different contig notations as identical (e.g: 'chr1' and '1'). The alternative names will be put into the appropriate @AN annotation for each contig. No header. First column is the original name, the second column is an alternative name. One contig may have more than one alternative name. Default value: null.

REFERENCE=File R=File Input reference fasta or fasta.gz Required.

Tool returned: 1 Traceback (most recent call last): File "/usr/local/src/ctat-mutations/mutation_lib_prep/ctat-mutation-lib-integration.py", line 66, in subprocess.check_call(cmd) File "/opt/conda/lib/python3.7/subprocess.py", line 347, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['gatk', 'CreateSequenceDictionary', '-R', '/home/stefania/RNAseq_project/test_ctat_mutations/GRCh38_gencode_v22_CTAT_lib_Mar012021/ctat_genome_lib_build_dir/ref_genome.fa', '-O', '/home/stefania/RNAseq_project/test_ctat_mutations/GRCh38_gencode_v22_CTAT_lib_Mar012021/ctat_genome_lib_build_dir/ref_genome.dict', 'VALIDATION_STRINGENCY=LENIENT']' returned non-zero exit status 4. INFO: Cleaning up image...

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/NCIP/ctat-mutations/issues/86#issuecomment-840678918, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZRKX7VMES7VK75UQC36CDTNP5M5ANCNFSM4ZYQGFUA .

--

Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas

--

Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas

brianjohnhaas commented 3 years ago

here you go:

https://data.broadinstitute.org/Trinity/CTAT_SINGULARITY/CTAT_MUTATIONS/ctat_mutations.v3.0.1.simg

please let me know how it goes.

best,

~brian

On Thu, May 13, 2021 at 1:34 PM Brian Haas @.***> wrote:

I'm in the process of rebuilding the docker and singularity images, so should have a fix for you in the next hour.

On Thu, May 13, 2021 at 1:09 PM Brian Haas @.***> wrote:

I see. Give me a second and I'll update the simg file. More shortly.

On Thu, May 13, 2021 at 12:33 PM ConcettaDe4 @.***> wrote:

Here the error message:

INFO: Converting SIF file to temporary sandbox... 2021-05-13 15:28:14,332: Generating /home/stefania/RNAseq_project/test_ctat_mutations/GRCh38_gencode_v22_CTAT_lib_Mar012021/ctat_genome_lib_build_dir/ref_genome.dict Using GATK jar /usr/local/src/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar Running: java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /usr/local/src/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar CreateSequenceDictionary -R /home/stefania/RNAseq_project/test_ctat_mutations/GRCh38_gencode_v22_CTAT_lib_Mar012021/ctat_genome_lib_build_dir/ref_genome.fa -O /home/stefania/RNAseq_project/test_ctat_mutations/GRCh38_gencode_v22_CTAT_lib_Mar012021/ctat_genome_lib_build_dir/ref_genome.dict VALIDATION_STRINGENCY=LENIENT INFO 2021-05-13 15:28:20 CreateSequenceDictionary

** NOTE: Picard's command line syntax is changing.


** For more information, please see: ** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)


** The command line looks like this in the new syntax:


** CreateSequenceDictionary -R /home/stefania/RNAseq_project/test_ctat_mutations/GRCh38_gencode_v22_CTAT_lib_Mar012021/ctat_genome_lib_build_dir/ref_genome.fa -O /home/stefania/RNAseq_project/test_ctat_mutations/GRCh38_gencode_v22_CTAT_lib_Mar012021/ctat_genome_lib_build_dir/ref_genome.dict -VALIDATION_STRINGENCY LENIENT


ERROR: Invalid argument '-R'.

USAGE: CreateSequenceDictionary [options]

Documentation: http://broadinstitute.github.io/picard/command-line-overview.html#CreateSequenceDictionary

Creates a sequence dictionary for a reference sequence. This tool creates a sequence dictionary file (with ".dict" extension) from a reference sequence provided in FASTA format, which is required by many processing and analysis tools. The output file contains a header but no SAMRecords, and the header contains only sequence records.

The reference sequence can be gzipped (both .fasta and .fasta.gz are supported). Usage example:

java -jar picard.jar CreateSequenceDictionary \ R=reference.fasta \ O=reference.dict

Version: 4.1.9.0

Options:

--help -h Displays options specific to this tool.

--stdhelp -H Displays options specific to this tool AND options common to all Picard command line tools.

--version Displays program version.

OUTPUT=File O=File Output SAM file containing only the sequence dictionary. By default it will use the base name of the input reference with the .dict extension Default value: null.

GENOME_ASSEMBLY=String AS=String Put into AS field of sequence dictionary entry if supplied Default value: null.

URI=String UR=String Put into UR field of sequence dictionary entry. If not supplied, input reference file is used Default value: null.

SPECIES=String SP=String Put into SP field of sequence dictionary entry Default value: null.

TRUNCATE_NAMES_AT_WHITESPACE=Boolean Make sequence name the first word from the > line in the fasta file. By default the entire contents of the > line is used, excluding leading and trailing whitespace. Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false}

NUM_SEQUENCES=Integer Stop after writing this many sequences. For testing. Default value: 2147483647. This option can be set to 'null' to clear the default value.

ALT_NAMES=File AN=File Optional file containing the alternative names for the contigs. Tools may use this information to consider different contig notations as identical (e.g: 'chr1' and '1'). The alternative names will be put into the appropriate @AN annotation for each contig. No header. First column is the original name, the second column is an alternative name. One contig may have more than one alternative name. Default value: null.

REFERENCE=File R=File Input reference fasta or fasta.gz Required.

Tool returned: 1 Traceback (most recent call last): File "/usr/local/src/ctat-mutations/mutation_lib_prep/ctat-mutation-lib-integration.py", line 66, in subprocess.check_call(cmd) File "/opt/conda/lib/python3.7/subprocess.py", line 347, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['gatk', 'CreateSequenceDictionary', '-R', '/home/stefania/RNAseq_project/test_ctat_mutations/GRCh38_gencode_v22_CTAT_lib_Mar012021/ctat_genome_lib_build_dir/ref_genome.fa', '-O', '/home/stefania/RNAseq_project/test_ctat_mutations/GRCh38_gencode_v22_CTAT_lib_Mar012021/ctat_genome_lib_build_dir/ref_genome.dict', 'VALIDATION_STRINGENCY=LENIENT']' returned non-zero exit status 4. INFO: Cleaning up image...

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/NCIP/ctat-mutations/issues/86#issuecomment-840678918, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZRKX7VMES7VK75UQC36CDTNP5M5ANCNFSM4ZYQGFUA .

--

Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas

--

Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas

--

Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas

ConcettaDe4 commented 3 years ago

Hi! Sorry for my late reply. Now it works!

Thank you for your help.

Best,

Concetta

brianjohnhaas commented 3 years ago

great! thanks for the update

On Tue, May 18, 2021 at 4:00 AM ConcettaDe4 @.***> wrote:

Hi! Sorry for my late reply. Now it works!

Thank you for your help.

Best,

Concetta

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/NCIP/ctat-mutations/issues/86#issuecomment-842948749, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZRKX3HZUAUC4S4FJB6W7DTOINCXANCNFSM4ZYQGFUA .

--

Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas