Open mutantjoo0 opened 4 years ago
Hi Joo-Young,
I believe that the problems you are experiencing are due to your misuse of the cut_up_fasta.py
script:
$ cut_up_fasta.py -h
usage: cut_up_fasta.py [-h] [-c CHUNK_SIZE] [-o OVERLAP_SIZE] [-m]
[-b BEDFILE]
contigs [contigs ...]
Cut up fasta file in non-overlapping or overlapping parts of equal length.
Optionally creates a BED-file where the cutup contigs are specified in terms
of the original contigs. This can be used as input to concoct_coverage_table.py.
positional arguments:
contigs Fasta files with contigs
optional arguments:
-h, --help show this help message and exit
-c CHUNK_SIZE, --chunk_size CHUNK_SIZE
Chunk size
-o OVERLAP_SIZE, --overlap_size OVERLAP_SIZE
Overlap size
-m, --merge_last Concatenate final part to last contig
-b BEDFILE, --bedfile BEDFILE
BEDfile to be created with exact regions of the
original contigs corresponding to the newly created
contigs
In your code you wrote:
cut_up_fasta.py -c 10000 -m -b ./concoct_bins/SAMPLE_10k.bed > ./concoct_bins/SAMPLE_10k.fa ./input_megahit_contigs/SAMPLE.final.contigs.fa
However, your input contigs ./input_megahit_contigs/SAMPLE.final.contigs.fa
are on the right hand side of the output redirect (>
). In bash terms, this means that the output of what is generating by the left hand side will be stored in the file specified on the right hand side. The correct way to write this would be:
cut_up_fasta.py -c 10000 -o 0 -m ./input_megahit_contigs/SAMPLE.final.contigs.fa -b ./concoct_bins/SAMPLE_10k.bed > ./concoct_bins/SAMPLE_10k.fa
Note that the above command should generate both the BED file (SAMPLE_10k.bed) and the cut up assembly (SAMPLE_10k.fa). Then generate the coverage table using the BED file + the sorted BAM file:
concoct_coverage_table.py contigs_10K.bed sorted.bam> coverage_table.tsv
Now we can run CONCOCT:
concoct --coverage_file coverage_table.tsv \
--composition_file SAMPLE_10k.fa \
-b sample_ID
Note that this approach assumes that you sorted bam file was generating by mapping your short reads against the ORIGINAL assembly, not the cut up assembly. Please also have a look at the CONCOCT help file to see additional useful parameters such as -c
and -t
:
$ concoct -h
cut_up usage: concoct [-h] [--coverage_file COVERAGE_FILE]
[--composition_file COMPOSITION_FILE] [-c CLUSTERS]
[-k KMER_LENGTH] [-t THREADS] [-l LENGTH_THRESHOLD]
[-r READ_LENGTH] [--total_percentage_pca TOTAL_PERCENTAGE_PCA]
[-b BASENAME] [-s SEED] [-i ITERATIONS]
[--no_cov_normalization] [--no_total_coverage]
[--no_original_data] [-o] [-d] [-v]
optional arguments:
-h, --help show this help message and exit
--coverage_file COVERAGE_FILE
specify the coverage file, containing a table where
each row correspond to a contig, and each column
correspond to a sample. The values are the average
coverage for this contig in that sample. All values
are separated with tabs.
--composition_file COMPOSITION_FILE
specify the composition file, containing sequences in
fasta format. It is named the composition file since
it is used to calculate the kmer composition (the
genomic signature) of each contig.
-c CLUSTERS, --clusters CLUSTERS
specify maximal number of clusters for VGMM, default
400.
-k KMER_LENGTH, --kmer_length KMER_LENGTH
specify kmer length, default 4.
-t THREADS, --threads THREADS
Number of threads to use
-l LENGTH_THRESHOLD, --length_threshold LENGTH_THRESHOLD
specify the sequence length threshold, contigs shorter
than this value will not be included. Defaults to
1000.
-r READ_LENGTH, --read_length READ_LENGTH
specify read length for coverage, default 100
--total_percentage_pca TOTAL_PERCENTAGE_PCA
The percentage of variance explained by the principal
components for the combined data.
-b BASENAME, --basename BASENAME
Specify the basename for files or directory where
outputwill be placed. Path to existing directory or
basenamewith a trailing '/' will be interpreted as a
directory.If not provided, current directory will be
used.
-s SEED, --seed SEED Specify an integer to use as seed for clustering. 0
gives a random seed, 1 is the default seed and any
other positive integer can be used. Other values give
ArgumentTypeError.
-i ITERATIONS, --iterations ITERATIONS
Specify maximum number of iterations for the VBGMM.
Default value is 500
--no_cov_normalization
By default the coverage is normalized with regards to
samples, then normalized with regards of contigs and
finally log transformed. By setting this flag you skip
the normalization and only do log transorm of the
coverage.
--no_total_coverage By default, the total coverage is added as a new
column in the coverage data matrix, independently of
coverage normalization but previous to log
transformation. Use this tag to escape this behaviour.
--no_original_data By default the original data is saved to disk. For big
datasets, especially when a large k is used for
compositional data, this file can become very large.
Use this tag if you don't want to save the original
data.
-o, --converge_out Write convergence info to files.
-d, --debug Debug parameters.
-v, --version show program's version number and exit
After running CONCOCT you will probably be interested in looking at the scripts merge_cutup_clustering.py
and extract_fasta_bins.py
to extract your draft bins.
Hope this helps! Francisco
Hi Francisco,
Thank you for your help. I started over from step 1 and noticed changes between my previous outputs and outputs from the suggested command. However, error still occurs as follows.
(concoct) -bash-4.2$ concoct_coverage_table.py GR25_c10k.bed ./bbmap_sorted_indexed_BAM_BAI/GR25_megahit_BBmapped_sorted.bam > GR25_c10k_covtab.tsv
[W::hts_idx_load2] The index file is older than the data file: ./bbmap_sorted_indexed_BAM_BAI/GR25_megahit_BBmapped_sorted.bam.bai
Errors in BED line 'k107_12810 0 301 k107_12810.concoct_part_0'
Errors in BED line 'k107_5125 0 231 k107_5125.concoct_part_0'
Errors in BED line 'k107_10248 0 404 k107_10248.concoct_part_0'
Errors in BED line 'k107_23058 0 328 k107_23058.concoct_part_0'
Errors in BED line 'k107_38430 0 346 k107_38430.concoct_part_0'
.
.
.
Errors in BED line 'k107_23054 0 784 k107_23054.concoct_part_0'
Errors in BED line 'k107_23055 0 495 k107_23055.concoct_part_0'
Errors in BED line 'k107_23056 0 391 k107_23056.concoct_part_0'
Errors in BED line 'k107_23057 0 753 k107_23057.concoct_part_0'
Traceback (most recent call last):
File "/mnt/home/leejooy5/miniconda3/envs/concoct/bin/concoct_coverage_table.py", line 91, in <module>
generate_input_table(args.bedfile, args.bamfiles, samplenames=samplenames)
File "/mnt/home/leejooy5/miniconda3/envs/concoct/bin/concoct_coverage_table.py", line 61, in generate_input_table
df = pd.read_table(fh, header=None)
File "/mnt/home/leejooy5/miniconda3/envs/concoct/lib/python3.6/site-packages/pandas/io/parsers.py", line 676, in parser_f
return _read(filepath_or_buffer, kwds)
File "/mnt/home/leejooy5/miniconda3/envs/concoct/lib/python3.6/site-packages/pandas/io/parsers.py", line 448, in _read
parser = TextFileReader(fp_or_buf, **kwds)
File "/mnt/home/leejooy5/miniconda3/envs/concoct/lib/python3.6/site-packages/pandas/io/parsers.py", line 880, in __init__
self._make_engine(self.engine)
File "/mnt/home/leejooy5/miniconda3/envs/concoct/lib/python3.6/site-packages/pandas/io/parsers.py", line 1114, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/mnt/home/leejooy5/miniconda3/envs/concoct/lib/python3.6/site-packages/pandas/io/parsers.py", line 1891, in __init__
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 532, in pandas._libs.parsers.TextReader.__cinit__
pandas.errors.EmptyDataError: No columns to parse from file
Line numbers in Traceback were partly different from those in previous errors, but descriptions still look same. Additionally I noticed a sort of warning [W::hts_idx_load2] The index file is older than the data file: ./bbmap_sorted_indexed_BAM_BAI/GR25_megahit_BBmapped_sorted.bam.bai
popped up before countless Errors began. I wonder if this warning means concoct does not allow bbmap-generated bam files for input or my bam files might have another issue.
For example, my input BAM files looks like this:
(concoct) -bash-4.2$ samtools view bbmap_sorted_indexed_BAM_BAI/GR25_megahit_BBmapped_sorted.bam | head -n 3
K00392:163:H2H55BBXY:8:2106:22029:7169 99 k107_12810 flag=1 multi=4.0000 len=301 1 45 150= = 129 278 ATTCCATATTTTGAACACTTACTATCACATTTTTATAATGCTCTATATTTTTCTCAGCTTCTGCTATGGTTTTCTTTTGTTTGTCTGTTAGGGTCGTAGACAACAAGATTTGCTCTACCTTCTTCTCTTCGCCTCGCTTGTACGTATCAA AAAFFJJFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFFJFJJJJJJJJJJJJJFJJJJAJFFJJJJFJJJJJJJFJJJJJJJJJJFJFJJJFJJJJJJJJJJJJJAJJJFJJJJJJJ NM:i:0 AM:i:45
K00392:163:H2H55BBXY:8:2106:22658:7205 99 k107_12810 flag=1 multi=4.0000 len=301 1 45 150= = 129 278 ATTCCATATTTTGAACACTTACTATCACATTTTTATAATGCTCTATATTTTTCTCAGCTTCTGCTATGGTTTTCTTTTGTTTGTCTGTTAGGGTCGTAGACAACAAGATTTGCTCTACCTTCTTCTCTTCGCCTCGCTTGTACGTATCAA AAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ NM:i:0 AM:i:45
K00392:163:H2H55BBXY:8:2205:21582:45924 145 k107_12810 flag=1 multi=4.0000 len=301 6 45 150= k107_37816 flag=0 multi=5.9522 len=1152 771 0 ATATTTTGAACACTTACTATCACATTTTTATAATGCTCTATATTTTTCTCAGCTTCTGCTATGGTTTTCTTTTGTTTGTCTGTTAGGGTCGTAGACAACAAGATTTGCTCTACCTTCTTCTCTTCGCCTCGCTTGTACGTATCAAGTGTA JJJJJJJF<<JJJJJJJJJJFAJJJJJJJJFAAJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJFJJJJFAFAA NM:i:0 AM:i:45
Also, I am wondering if you have used coverage files generated with other tools such as BBMap (pileup.sh) and MetaBAT2 (jgi_summarize_bam_contig_depths) since I already have two versions of coverage depth.txt files generated from BBmap and MetaBAT2 as follows:
(concoct) -bash-4.2$ wc -lmw cov_depth_input-bbmap_sorted_indexed_BAM-202006*/GR25*
54591 436725 3651726 cov_depth_input-bbmap_sorted_indexed_BAM-20200617_jgi_summarize_bam_contigs_depth/GR25_megahit_bbmap_sorted_depth.txt
54591 764271 4907010 cov_depth_input-bbmap_sorted_indexed_BAM-20200623_BBMAP_pileup/GR25_cov.txt
109182 1200996 8558736 total
(concoct) -bash-4.2$ head -n 3 cov_depth_input-bbmap_sorted_indexed_BAM-202006*/GR25*
==> cov_depth_input-bbmap_sorted_indexed_BAM-20200617_jgi_summarize_bam_contigs_depth/GR25_megahit_bbmap_sorted_depth.txt <==
contigName contigLen totalAvgDepth GR25_megahit_BBmapped_sorted.bam GR25_megahit_BBmapped_sorted.bam-var
k107_12810 flag=1 multi=4.0000 len=301 301 5.7351 5.7351 1.98269
k107_5125 flag=0 multi=0.8468 len=231 231 10.9012 10.9012 3.99012
==> cov_depth_input-bbmap_sorted_indexed_BAM-20200623_BBMAP_pileup/GR25_cov.txt <==
#ID Avg_fold Length Ref_GC Covered_percent Covered_bases Plus_reads Minus_reads Read_GC Median_fold Std_Dev
k107_12810 flag=1 multi=4.0000 len=301 5.0797 301 0.0000 100.0000 301 7 4 0.3603 4 1.70
k107_5125 flag=0 multi=0.8468 len=231 25.8571 231 0.0000 100.0000 231 24 22 0.4298 27 12.89
I tried running concoct --coverage_file --composition_file
with two different coverage files and concoct complaint again as shown below.
#GR25, 1st try:
(concoct) -bash-4.2$ concoct -t 10 -l 1500 --coverge_file ../cov_depth_input-bbmap_sorted_indexed_BAM-20200617_jgi_summarize_bam_contigs_depth/GR25_megahit_bbmap_sorted_depth.txt --composition_fil GR25_c10k.fa -b ./GR25/GR25_j
usage: concoct [-h] [--coverage_file COVERAGE_FILE]
[--composition_file COMPOSITION_FILE] [-c CLUSTERS]
[-k KMER_LENGTH] [-t THREADS] [-l LENGTH_THRESHOLD]
[-r READ_LENGTH] [--total_percentage_pca TOTAL_PERCENTAGE_PCA]
[-b BASENAME] [-s SEED] [-i ITERATIONS]
[--no_cov_normalization] [--no_total_coverage]
[--no_original_data] [-o] [-d] [-v]
concoct: error: unrecognized arguments: --coverge_file ../cov_depth_input-bbmap_sorted_indexed_BAM-20200617_jgi_summarize_bam_contigs_depth/GR25_megahit_bbmap_sorted_depth.txt
#convert txt to tsv cp *txt *tsv
#GR25, 2nd try:
(concoct) -bash-4.2$ concoct -t 10 -l 1500 --coverge_file ../cov_depth_input-bbmap_sorted_indexed_BAM-20200617_jgi_summarize_bam_contigs_depth/GR25_cov.tsv --composition_file GR25_c10k.fa -b ./GR25/GR25_j
usage: concoct [-h] [--coverage_file COVERAGE_FILE]
[--composition_file COMPOSITION_FILE] [-c CLUSTERS]
[-k KMER_LENGTH] [-t THREADS] [-l LENGTH_THRESHOLD]
[-r READ_LENGTH] [--total_percentage_pca TOTAL_PERCENTAGE_PCA]
[-b BASENAME] [-s SEED] [-i ITERATIONS]
[--no_cov_normalization] [--no_total_coverage]
[--no_original_data] [-o] [-d] [-v]
concoct: error: unrecognized arguments: --coverge_file ../cov_depth_input-bbmap_sorted_indexed_BAM-20200617_jgi_summarize_bam_contigs_depth/GR25_cov.tsv
#GR25, 3rd try:
(concoct) -bash-4.2$ concoct -t 10 -l 1500 --coverge_file ../cov_depth_input-bbmap_sorted_indexed_BAM-20200623_BBMAP_pileup/GR25_cov.txt --composition_file GR25_c10k.fa -b ./GR25/GR25_b
usage: concoct [-h] [--coverage_file COVERAGE_FILE]
[--composition_file COMPOSITION_FILE] [-c CLUSTERS]
[-k KMER_LENGTH] [-t THREADS] [-l LENGTH_THRESHOLD]
[-r READ_LENGTH] [--total_percentage_pca TOTAL_PERCENTAGE_PCA]
[-b BASENAME] [-s SEED] [-i ITERATIONS]
[--no_cov_normalization] [--no_total_coverage]
[--no_original_data] [-o] [-d] [-v]
concoct: error: unrecognized arguments: --coverge_file ../cov_depth_input-bbmap_sorted_indexed_BAM-20200623_BBMAP_pileup/GR25_cov.txt
#GR25, 4th try:
(concoct) -bash-4.2$ concoct -t 10 -l 1500 --coverge_file ../cov_depth_input-bbmap_sorted_indexed_BAM-20200623_BBMAP_pileup/GR25_cov.tsv --composition_file GR25_c10k.fa -b ./GR25/GR25_b
usage: concoct [-h] [--coverage_file COVERAGE_FILE]
[--composition_file COMPOSITION_FILE] [-c CLUSTERS]
[-k KMER_LENGTH] [-t THREADS] [-l LENGTH_THRESHOLD]
[-r READ_LENGTH] [--total_percentage_pca TOTAL_PERCENTAGE_PCA]
[-b BASENAME] [-s SEED] [-i ITERATIONS]
[--no_cov_normalization] [--no_total_coverage]
[--no_original_data] [-o] [-d] [-v]
concoct: error: unrecognized arguments: --coverge_file ../cov_depth_input-bbmap_sorted_indexed_BAM-20200623_BBMAP_pileup/GR25_cov.tsv
I assumed that concoct-generated coverage file must have unique header format and different numbers of columns. Could you help me to modify headers and columns in my coverage files? If you can provide an example of concoct coverage file, it will be a great clue for me to step forward. Thank you for your time and help again!
Stay healthy, Joo-Young
Hi Joo-Young,
Sorry to hear that you are having trouble interacting with CONCOCT and your conda environments. I can also suggest you try out kbase. This platform allows you to more easily interact with a HPCC using a graphical user interface, giving you access to CONCOCT and many other bioinformatic tools without having to install/setup anything or write any code yourself. However, you do need to make a free account and upload your files to their servers. Although this can be prohibitive, I think its still a good place to start to play around with these tools without getting bogged down by the installation/troubleshooting details. There are other alternative platforms too, e.g. galaxy.
I see spelling mistakes in the parameters in all four of your attempts to run CONCOCT. Particularly, you consistently misspelled the parameter --coverage_file
as --coverge_file
, thus the error you see in every attempt:
concoct: error: unrecognized arguments: --coverge_file ../cov_depth_input-bbmap_sorted_indexed_BAM-20200617_jgi_summarize_bam_contigs_depth/GR25_megahit_bbmap_sorted_depth.txt
I have never used BBmap for CONCOCT, generally I use bwa-mem for any mapping operations.
If you want to use MetaBat2's jgi_summarize_bam_contig_depths
, or take a look at what a working coverage table looks like I can refer to you to this post #286, particularly the second comment:
I took a closer look and I suspect that my method for generating the concoct_coverage.table would only work if the jgi_summarize_bam_contig_depths output depth files are generated on sorted bam files that are mapped against the cut up contigs.
As you can see in the post, if I have 3 samples then a coverage table may look like this:
$ less master_covtable_coverage_table_ERR599120.tsv|head
contig cov_mean_sample_ERR599120 cov_mean_sample_ERR599121 cov_mean_sample_ERR599122
k119_371504-flag=1-multi=2.0000-len=322.concoct_part_0 3.137 2.050 1.370
k119_451110-flag=1-multi=2.0000-len=321.concoct_part_0 1.885 0.000 0.000
If you are only mapping against the focal sample then the coverage table would only have 2 columns: one with contig IDs (note that the .concoct_part_X
extension in the IDs lets us know that these are the "cut-up" contigs that we want) and one with the corresponding coverage.
Best of luck! Francisco
Hi Francisco,
Thank you for your kind support. I did re-run without typos in command. As you can see below, using coverage file, either from jgi_summarize_bam_contigs_depths
or pileup
, I got same errors.
(concoct) -bash-4.2$ concoct -l 1500 --coverage_file ../cov_depth_input-bbmap_sorted_indexed_BAM-20200617_jgi_summarize_bam_contigs_depth/GR25_cov.tsv --composition_file GR25_c10k.fa -b ./concoct_output/GR25/
WARNING:root:CONCOCT is running in single threaded mode. Please, consider adjusting the --threads parameter.
Up and running. Check /mnt/ufs18/rs-002/Reguera_Kashefi_Lab/JYL/phylophlan_MAGs/concoct_bins/concoct_output/GR25/log.txt for progress
/mnt/home/leejooy5/miniconda3/envs/concoct/lib/python3.6/site-packages/concoct/input.py:115: RuntimeWarning: divide by zero encountered in log
cov.loc[:,cov_range[0]:cov_range[1]])
Traceback (most recent call last):
File "/mnt/home/leejooy5/miniconda3/envs/concoct/bin/concoct", line 90, in <module>
results = main(args)
File "/mnt/home/leejooy5/miniconda3/envs/concoct/bin/concoct", line 40, in main
args.seed
File "/mnt/home/leejooy5/miniconda3/envs/concoct/lib/python3.6/site-packages/concoct/transform.py", line 5, in perform_pca
pca_object = PCA(n_components=nc, random_state=seed).fit(d)
File "/mnt/home/leejooy5/miniconda3/envs/concoct/lib/python3.6/site-packages/sklearn/decomposition/_pca.py", line 351, in fit
self._fit(X)
File "/mnt/home/leejooy5/miniconda3/envs/concoct/lib/python3.6/site-packages/sklearn/decomposition/_pca.py", line 398, in _fit
ensure_2d=True, copy=self.copy)
File "/mnt/home/leejooy5/miniconda3/envs/concoct/lib/python3.6/site-packages/sklearn/base.py", line 420, in _validate_data
X = check_array(X, **check_params)
File "/mnt/home/leejooy5/miniconda3/envs/concoct/lib/python3.6/site-packages/sklearn/utils/validation.py", line 73, in inner_f
return f(**kwargs)
File "/mnt/home/leejooy5/miniconda3/envs/concoct/lib/python3.6/site-packages/sklearn/utils/validation.py", line 654, in check_array
context))
ValueError: Found array with 0 sample(s) (shape=(0, 141)) while a minimum of 1 is required.
(concoct) -bash-4.2$ concoct -t 10 -l 1500 --coverage_file ../cov_depth_input-bbmap_sorted_indexed_BAM-20200623_BBMAP_pileup/GR25_cov.txt --composition_file GR25_c10k.fa -b ./concoct_output/GR25b/
Up and running. Check /mnt/ufs18/rs-002/Reguera_Kashefi_Lab/JYL/phylophlan_MAGs/concoct_bins/concoct_output/GR25b/log.txt for progress
/mnt/home/leejooy5/miniconda3/envs/concoct/lib/python3.6/site-packages/concoct/input.py:115: RuntimeWarning: divide by zero encountered in log
cov.loc[:,cov_range[0]:cov_range[1]])
Traceback (most recent call last):
File "/mnt/home/leejooy5/miniconda3/envs/concoct/bin/concoct", line 90, in <module>
results = main(args)
File "/mnt/home/leejooy5/miniconda3/envs/concoct/bin/concoct", line 40, in main
args.seed
File "/mnt/home/leejooy5/miniconda3/envs/concoct/lib/python3.6/site-packages/concoct/transform.py", line 5, in perform_pca
pca_object = PCA(n_components=nc, random_state=seed).fit(d)
File "/mnt/home/leejooy5/miniconda3/envs/concoct/lib/python3.6/site-packages/sklearn/decomposition/_pca.py", line 351, in fit
self._fit(X)
File "/mnt/home/leejooy5/miniconda3/envs/concoct/lib/python3.6/site-packages/sklearn/decomposition/_pca.py", line 398, in _fit
ensure_2d=True, copy=self.copy)
File "/mnt/home/leejooy5/miniconda3/envs/concoct/lib/python3.6/site-packages/sklearn/base.py", line 420, in _validate_data
X = check_array(X, **check_params)
File "/mnt/home/leejooy5/miniconda3/envs/concoct/lib/python3.6/site-packages/sklearn/utils/validation.py", line 73, in inner_f
return f(**kwargs)
File "/mnt/home/leejooy5/miniconda3/envs/concoct/lib/python3.6/site-packages/sklearn/utils/validation.py", line 654, in check_array
context))
ValueError: Found array with 0 sample(s) (shape=(0, 147)) while a minimum of 1 is required.
Then, I tried to used modify coverage file from metabat2 as described your post([#286]``` (https://github.com/BinPro/CONCOCT/issues/286)) and run concoct.
(concoct) -bash-4.2$ head -n 5 ../cov_depth_input-bbmap_sorted_indexed_BAM-20200617_jgi_summarize_bam_contigs_depth/GR25_megahit_bbmap_sorted_depth.txt
contigName contigLen totalAvgDepth GR25_megahit_BBmapped_sorted.bam GR25_megahit_BBmapped_sorted.bam-var
k107_12810 flag=1 multi=4.0000 len=301 301 5.7351 5.7351 1.98269
k107_5125 flag=0 multi=0.8468 len=231 231 10.9012 10.9012 3.99012
k107_10248 flag=1 multi=1.0000 len=404 404 4.05512 4.05512 3.47521
k107_23058 flag=1 multi=2.0000 len=328 328 1.79213 1.79213 0.470674
(concoct) -bash-4.2$ for depth in *depth.txt;do less $depth|cut -f4 > $depth.col;done
(concoct) -bash-4.2$ less GR25_megahit_bbmap_sorted_depth.txt|cut -f1 > GR25_rownames
(concoct) -bash-4.2$ paste GR25_rownames GR25_megahit_bbmap_sorted_depth.txt.col > GR25_concoct_cov.table
(concoct) -bash-4.2$ head -n 5 ../cov_depth_input-bbmap_sorted_indexed_BAM-20200617_jgi_summarize_bam_contigs_depth/GR25_concoct_cov.table
contigName GR25_megahit_BBmapped_sorted.bam
k107_12810 flag=1 multi=4.0000 len=301 5.7351
k107_5125 flag=0 multi=0.8468 len=231 10.9012
k107_10248 flag=1 multi=1.0000 len=404 4.05512
k107_23058 flag=1 multi=2.0000 len=328 1.79213
(concoct) -bash-4.2$ concoct -l 1500 -t 30 --coverage_file ../cov_depth_input-bbmap_sorted_indexed_BAM-20200617_jgi_summarize_bam_contigs_depth/GR25_concoct_cov.table --composition_file GR25_c10k.fa -b ./concoct_output/GR25_concoct
Up and running. Check /mnt/ufs18/rs-002/Reguera_Kashefi_Lab/JYL/phylophlan_MAGs/concoct_bins/concoct_output/GR25_concoct_log.txt for progress
Traceback (most recent call last):
File "/mnt/home/leejooy5/miniconda3/envs/concoct/bin/concoct", line 90, in <module>
results = main(args)
File "/mnt/home/leejooy5/miniconda3/envs/concoct/bin/concoct", line 40, in main
args.seed
File "/mnt/home/leejooy5/miniconda3/envs/concoct/lib/python3.6/site-packages/concoct/transform.py", line 5, in perform_pca
pca_object = PCA(n_components=nc, random_state=seed).fit(d)
File "/mnt/home/leejooy5/miniconda3/envs/concoct/lib/python3.6/site-packages/sklearn/decomposition/_pca.py", line 351, in fit
self._fit(X)
File "/mnt/home/leejooy5/miniconda3/envs/concoct/lib/python3.6/site-packages/sklearn/decomposition/_pca.py", line 398, in _fit
ensure_2d=True, copy=self.copy)
File "/mnt/home/leejooy5/miniconda3/envs/concoct/lib/python3.6/site-packages/sklearn/base.py", line 420, in _validate_data
X = check_array(X, **check_params)
File "/mnt/home/leejooy5/miniconda3/envs/concoct/lib/python3.6/site-packages/sklearn/utils/validation.py", line 73, in inner_f
return f(**kwargs)
File "/mnt/home/leejooy5/miniconda3/envs/concoct/lib/python3.6/site-packages/sklearn/utils/validation.py", line 654, in check_array
context))
ValueError: Found array with 0 sample(s) (shape=(0, 138)) while a minimum of 1 is required.
After all, I got sill same error. I noticed you used 1 composition file and 1 coverage file for 3 samples. I wonder if that is must do to use concoct and how I can make that with multiple samples having different numbers of contigs. Thank you for your help!
Cheers, Joo-Young Thanks
Hello CONCOCT team (@andand @alneberg @binnisb @inodb )
I am getting error while running
concoct_coverage_table.py
. I am using CONCOCT on miniconda environment as follows:My contigs fa files were generated with megahit and mapped/sorted/indexed bam files were generated with bbmap.
First, following the workflow described in Basic Usage, I cut contigs by running
cut_up_fasta.py -c 10000 -m -b ./concoct_bins/SAMPLE_10k.bed > ./concoct_bins/SAMPLE_10k.fa ./input_megahit_contigs/SAMPLE.final.contigs.fa
. Then, I got error in second step by runningconcoct_coverage_table.py
. I tested with multiple samples, same error occurred repeatedly. I assumed that CONCOCT does not recognize columns in bed file resulted from first step,cut_up_fasta.py
. I have attached example of my error as follows:I wonder which step I can fix this issue, for example, running
cut_up_fasta.py
with increased or decreased-c
setting. Please enlighten me. Thank you for your time and support in advance.added on June 30, 2020
I tried a couple of different approaches to fix this issue. 1) use different parameters for step1. cut_up.fasta -c10k->20k; 2) create a new concoct environment and install concoct there; 3) install optional dependencies: bedtools, picard, samtools, bowtie2, gnu parallel, pysam in concoct environment.
The followings are corresponding results from each trial. Please note that I only copied and pasted commands and Traceback parts shown errors.
1) errors from running on concoct environment
2) errors from different parameter set in step 1
3) errors after installing optional dependencies
As you can see running concoct_coverage_table.py after installing optional dependencies reduced numbers of errors. However there are still errors regarding line 77 and line 28. Please help me to understand/fix this problem.
Stay healthy, Joo-Young