Closed y9c closed 7 years ago
uname -a
Linux 94 4.4.0-51-generic
#72-Ubuntu SMP Thu Nov 24 18:29:54 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
What's your commit tag, and what is the command-line that is failing? It seems to be failing to load zlib.
@dnbh
I just run make
after git clone
the full log
HEAD is now at f90ae8b... Add zr2
/tmp/cchPfbS8.ltrans0.ltrans.o: In function `kseq_read(kseq_t*) [clone .lto_priv.291] [clone .lto_priv.295]':
<artificial>:(.text+0xaf9): undefined reference to `gzread'
<artificial>:(.text+0xcae): undefined reference to `gzread'
<artificial>:(.text+0xd84): undefined reference to `gzread'
<artificial>:(.text+0xeb2): undefined reference to `gzread'
<artificial>:(.text+0xf04): undefined reference to `gzread'
/tmp/cchPfbS8.ltrans0.ltrans.o: In function `bmf::sdmp_main(int, char**) [clone .part.13]':
<artificial>:(.text+0x27d8): undefined reference to `gzopen'
<artificial>:(.text+0x27f2): undefined reference to `gzopen'
<artificial>:(.text+0x2845): undefined reference to `gzopen'
<artificial>:(.text+0x2a6d): undefined reference to `gzopen'
<artificial>:(.text+0x2a87): undefined reference to `gzopen'
<artificial>:(.text+0x2c77): undefined reference to `gzprintf'
<artificial>:(.text+0x2e02): undefined reference to `gzprintf'
<artificial>:(.text+0x2e38): undefined reference to `gzclose'
<artificial>:(.text+0x2e42): undefined reference to `gzclose'
<artificial>:(.text+0x2e5d): undefined reference to `gzclose'
<artificial>:(.text+0x303e): undefined reference to `gzprintf'
<artificial>:(.text+0x308a): undefined reference to `gzprintf'
<artificial>:(.text+0x3291): undefined reference to `gzprintf'
<artificial>:(.text+0x32c3): undefined reference to `gzprintf'
<artificial>:(.text+0x330d): undefined reference to `gzclose'
<artificial>:(.text+0x3317): undefined reference to `gzclose'
<artificial>:(.text+0x3321): undefined reference to `gzclose'
<artificial>:(.text+0x3343): undefined reference to `gzclose'
<artificial>:(.text+0x3354): undefined reference to `gzclose'
/tmp/cchPfbS8.ltrans0.ltrans.o: In function `bmf::idmp_main(int, char**)':
<artificial>:(.text+0x3afd): undefined reference to `gzopen'
<artificial>:(.text+0x3e19): undefined reference to `gzprintf'
<artificial>:(.text+0x402e): undefined reference to `gzprintf'
<artificial>:(.text+0x4084): undefined reference to `gzclose'
<artificial>:(.text+0x40b4): undefined reference to `gzclose'
<artificial>:(.text+0x4459): undefined reference to `gzopen'
<artificial>:(.text+0x4473): undefined reference to `gzopen'
<artificial>:(.text+0x47e2): undefined reference to `gzprintf'
<artificial>:(.text+0x482f): undefined reference to `gzprintf'
<artificial>:(.text+0x4c82): undefined reference to `gzprintf'
<artificial>:(.text+0x4cb6): undefined reference to `gzprintf'
<artificial>:(.text+0x4e02): undefined reference to `gzprintf'
/tmp/cchPfbS8.ltrans0.ltrans.o:<artificial>:(.text+0x4e35): more undefined references to `gzprintf' follow
/tmp/cchPfbS8.ltrans0.ltrans.o: In function `bmf::idmp_main(int, char**)':
<artificial>:(.text+0x4ea0): undefined reference to `gzclose'
<artificial>:(.text+0x4eb5): undefined reference to `gzclose'
<artificial>:(.text+0x4ef5): undefined reference to `gzclose'
<artificial>:(.text+0x4eff): undefined reference to `gzclose'
<artificial>:(.text+0x5005): undefined reference to `gzprintf'
/tmp/cchPfbS8.ltrans1.ltrans.o: In function `bmf::hash_inmem_inline_core(char*, char*, char*, char*, char*, int, int, int, int, int)':
<artificial>:(.text+0x162): undefined reference to `gzdopen'
<artificial>:(.text+0x186): undefined reference to `gzdopen'
<artificial>:(.text+0xc3d): undefined reference to `gzclose'
<artificial>:(.text+0xc4a): undefined reference to `gzclose'
<artificial>:(.text+0x271b): undefined reference to `gzputs'
<artificial>:(.text+0x273c): undefined reference to `gzputs'
<artificial>:(.text+0x3aa5): undefined reference to `gzputs'
<artificial>:(.text+0x3aff): undefined reference to `gzputs'
<artificial>:(.text+0x3bfa): undefined reference to `gzclose'
<artificial>:(.text+0x3c07): undefined reference to `gzclose'
<artificial>:(.text+0x415e): undefined reference to `gzclose'
<artificial>:(.text+0x41b3): undefined reference to `gzclose'
/tmp/cchPfbS8.ltrans2.ltrans.o: In function `bmf::depth_main(int, char**)':
<artificial>:(.text+0x418): undefined reference to `gzopen'
<artificial>:(.text+0xd67): undefined reference to `gzclose'
/tmp/cchPfbS8.ltrans6.ltrans.o: In function `bmf::stranded_hash_dmp_core(char*, char*, int)':
<artificial>:(.text+0x69): undefined reference to `gzopen'
<artificial>:(.text+0x8d): undefined reference to `gzopen'
<artificial>:(.text+0x1bb9): undefined reference to `gzputs'
<artificial>:(.text+0x1d87): undefined reference to `gzputs'
<artificial>:(.text+0x1e98): undefined reference to `gzclose'
<artificial>:(.text+0x1ea2): undefined reference to `gzclose'
<artificial>:(.text+0x249e): undefined reference to `gzclose'
<artificial>:(.text+0x24a8): undefined reference to `gzclose'
/tmp/cchPfbS8.ltrans12.ltrans.o: In function `bmf::init_splitter(bmf::marksplit_settings_t*)':
<artificial>:(.text+0x1ef): undefined reference to `gzopen'
<artificial>:(.text+0x205): undefined reference to `gzopen'
<artificial>:(.text+0x33e): undefined reference to `gzopen'
/tmp/cchPfbS8.ltrans14.ltrans.o: In function `bmf::hash_dmp_core(char*, char*, int)':
<artificial>:(.text+0x98): undefined reference to `gzopen'
<artificial>:(.text+0xba): undefined reference to `gzdopen'
<artificial>:(.text+0x1654): undefined reference to `gzputs'
<artificial>:(.text+0x19d8): undefined reference to `gzclose'
<artificial>:(.text+0x19e2): undefined reference to `gzclose'
<artificial>:(.text+0x1adb): undefined reference to `gzclose'
<artificial>:(.text+0x1b12): undefined reference to `gzclose'
<artificial>:(.text+0x1b24): undefined reference to `gzclose'
/tmp/cchPfbS8.ltrans17.ltrans.o: In function `dlib::open_gzfile(char*, char const*)':
<artificial>:(.text+0x4c3): undefined reference to `gzdopen'
<artificial>:(.text+0x4d1): undefined reference to `gzopen'
/tmp/cchPfbS8.ltrans20.ltrans.o: In function `ks_getuntil2(__kstream_t*, int, __kstring_t*, int*, int) [clone .constprop.79]':
<artificial>:(.text+0xf4): undefined reference to `gzread'
/tmp/cchPfbS8.ltrans21.ltrans.o: In function `dlib::parse_bed_hash(char const*, bam_hdr_t*, unsigned int)':
<artificial>:(.text+0x90c): undefined reference to `gzopen'
<artificial>:(.text+0x94c): undefined reference to `gzgets'
<artificial>:(.text+0xa53): undefined reference to `gzclose'
libhts.a(hts.o): In function `decompress_peek':
/share/tools/ngs-tools/BMFtools/htslib/hts.c:142: undefined reference to `inflateInit2_'
/share/tools/ngs-tools/BMFtools/htslib/hts.c:145: undefined reference to `inflate'
/share/tools/ngs-tools/BMFtools/htslib/hts.c:148: undefined reference to `inflateEnd'
/share/tools/ngs-tools/BMFtools/htslib/hts.c:148: undefined reference to `inflateEnd'
libhts.a(vcf.o): In function `vcf_hdr_read':
/share/tools/ngs-tools/BMFtools/htslib/vcf.c:1288: undefined reference to `gzopen'
libhts.a(vcf.o): In function `ks_getc':
/share/tools/ngs-tools/BMFtools/htslib/vcf.c:49: undefined reference to `gzread'
libhts.a(vcf.o): In function `vcf_hdr_read':
/share/tools/ngs-tools/BMFtools/htslib/vcf.c:1301: undefined reference to `gzclose'
libhts.a(cram_io.o): In function `zlib_mem_deflate':
/share/tools/ngs-tools/BMFtools/htslib/cram/cram_io.c:782: undefined reference to `deflateInit2_'
/share/tools/ngs-tools/BMFtools/htslib/cram/cram_io.c:796: undefined reference to `deflate'
/share/tools/ngs-tools/BMFtools/htslib/cram/cram_io.c:803: undefined reference to `deflate'
/share/tools/ngs-tools/BMFtools/htslib/cram/cram_io.c:808: undefined reference to `deflateEnd'
libhts.a(cram_io.o): In function `itf8_decode_crc':
/share/tools/ngs-tools/BMFtools/htslib/cram/cram_io.c:216: undefined reference to `crc32'
/share/tools/ngs-tools/BMFtools/htslib/cram/cram_io.c:225: undefined reference to `crc32'
/share/tools/ngs-tools/BMFtools/htslib/cram/cram_io.c:195: undefined reference to `crc32'
/share/tools/ngs-tools/BMFtools/htslib/cram/cram_io.c:201: undefined reference to `crc32'
/share/tools/ngs-tools/BMFtools/htslib/cram/cram_io.c:208: undefined reference to `crc32'
libhts.a(cram_io.o):/share/tools/ngs-tools/BMFtools/htslib/cram/cram_io.c:543: more undefined references to `crc32' follow
libhts.a(cram_io.o): In function `zlib_mem_inflate':
/share/tools/ngs-tools/BMFtools/htslib/cram/cram_io.c:718: undefined reference to `inflateInit2_'
/share/tools/ngs-tools/BMFtools/htslib/cram/cram_io.c:731: undefined reference to `inflate'
/share/tools/ngs-tools/BMFtools/htslib/cram/cram_io.c:751: undefined reference to `inflateEnd'
libhts.a(cram_io.o): In function `cram_read_block':
/share/tools/ngs-tools/BMFtools/htslib/cram/cram_io.c:946: undefined reference to `crc32'
/share/tools/ngs-tools/BMFtools/htslib/cram/cram_io.c:948: undefined reference to `crc32'
/share/tools/ngs-tools/BMFtools/htslib/cram/cram_io.c:980: undefined reference to `crc32'
libhts.a(cram_io.o): In function `cram_write_block':
/share/tools/ngs-tools/BMFtools/htslib/cram/cram_io.c:1049: undefined reference to `crc32'
/share/tools/ngs-tools/BMFtools/htslib/cram/cram_io.c:1054: undefined reference to `crc32'
libhts.a(cram_io.o):/share/tools/ngs-tools/BMFtools/htslib/cram/cram_io.c:3092: more undefined references to `crc32' follow
libhts.a(zfio.o): In function `zfputs':
/share/tools/ngs-tools/BMFtools/htslib/cram/zfio.c:80: undefined reference to `gzputs'
libhts.a(zfio.o): In function `zfpeek':
/share/tools/ngs-tools/BMFtools/htslib/cram/zfio.c:97: undefined reference to `gzungetc'
/share/tools/ngs-tools/BMFtools/htslib/cram/zfio.c:95: undefined reference to `gzgetc'
libhts.a(zfio.o): In function `zfopen':
/share/tools/ngs-tools/BMFtools/htslib/cram/zfio.c:165: undefined reference to `gzopen'
/share/tools/ngs-tools/BMFtools/htslib/cram/zfio.c:169: undefined reference to `gzopen'
libhts.a(zfio.o): In function `zfclose':
/share/tools/ngs-tools/BMFtools/htslib/cram/zfio.c:180: undefined reference to `gzclose'
libhts.a(zfio.o): In function `zfgets':
/share/tools/ngs-tools/BMFtools/htslib/cram/zfio.c:69: undefined reference to `gzgets'
libhts.a(zfio.o): In function `zfeof':
/share/tools/ngs-tools/BMFtools/htslib/cram/zfio.c:105: undefined reference to `gzeof'
libhts.a(bgzf.o): In function `bgzf_write_init':
/share/tools/ngs-tools/BMFtools/htslib/bgzf.c:166: undefined reference to `deflateInit2_'
libhts.a(bgzf.o): In function `inflate_gzip_block':
/share/tools/ngs-tools/BMFtools/htslib/bgzf.c:335: undefined reference to `inflate'
libhts.a(bgzf.o): In function `bgzf_open':
/share/tools/ngs-tools/BMFtools/htslib/bgzf.c:174: undefined reference to `compressBound'
libhts.a(bgzf.o): In function `bgzf_dopen':
/share/tools/ngs-tools/BMFtools/htslib/bgzf.c:196: undefined reference to `compressBound'
libhts.a(bgzf.o): In function `bgzf_hopen':
/share/tools/ngs-tools/BMFtools/htslib/bgzf.c:218: undefined reference to `compressBound'
libhts.a(bgzf.o): In function `bgzf_compress':
/share/tools/ngs-tools/BMFtools/htslib/bgzf.c:244: undefined reference to `deflateInit2_'
/share/tools/ngs-tools/BMFtools/htslib/bgzf.c:245: undefined reference to `deflate'
/share/tools/ngs-tools/BMFtools/htslib/bgzf.c:246: undefined reference to `deflateEnd'
/share/tools/ngs-tools/BMFtools/htslib/bgzf.c:252: undefined reference to `crc32'
/share/tools/ngs-tools/BMFtools/htslib/bgzf.c:252: undefined reference to `crc32'
libhts.a(bgzf.o): In function `bgzf_gzip_compress':
/share/tools/ngs-tools/BMFtools/htslib/bgzf.c:267: undefined reference to `deflate'
libhts.a(bgzf.o): In function `inflate_block':
/share/tools/ngs-tools/BMFtools/htslib/bgzf.c:302: undefined reference to `inflateInit2_'
/share/tools/ngs-tools/BMFtools/htslib/bgzf.c:306: undefined reference to `inflate'
/share/tools/ngs-tools/BMFtools/htslib/bgzf.c:311: undefined reference to `inflateEnd'
libhts.a(bgzf.o): In function `bgzf_read_block':
/share/tools/ngs-tools/BMFtools/htslib/bgzf.c:505: undefined reference to `inflateInit2_'
libhts.a(bgzf.o): In function `inflate_block':
/share/tools/ngs-tools/BMFtools/htslib/bgzf.c:307: undefined reference to `inflateEnd'
libhts.a(bgzf.o): In function `bgzf_close':
/share/tools/ngs-tools/BMFtools/htslib/bgzf.c:816: undefined reference to `inflateEnd'
/share/tools/ngs-tools/BMFtools/htslib/bgzf.c:817: undefined reference to `deflateEnd'
collect2: error: ld returned 1 exit status
make: *** [bmftools] Error 1
This problem was likely fixed in some recent changes in the Makefile. Can you switch to branch dev and rebuild?
Thanks for reporting!
@dnbh
$ git checkout dev
$ make clean
$ make
collect2: error: ld returned 1 exit status
Makefile:98: recipe for target 'test/ucs/ucs_test' failed
make: *** [test/ucs/ucs_test] Error 1
make: *** Waiting for unfinished jobs....
It seems like your make isn't being helpful. Let's make it emit the commands that it's running before it runs them:
make clean && make -j1 SHELL="bash -x"
This should force it to emit the commands it's calling. The -lz flag could be in the wrong spot.
Alternatively, your zlib.so might not be in your LD_LIBRARY_PATH.
I have zlib in library path.
after run make clean && make -j1 SHELL="bash -x"
, the error remains
I try the command above in both zsh and bash shell.
What is the difference between BMFtools and UMI-tools? https://github.com/CGATOxford/UMI-tools
When I run make, it gives me a list of commands it is executing. Does yours not?
For example, at the start,
gcc-mp-6 -c -Wuninitialized -Wunreachable-code -Wall -fopenmp -DBMF_VERSION=\"v1.1-39-ga448\" -std=gnu99 -fno-builtin-gamma -pedantic -Ihtslib -Iinclude -I. -lm -lz -lpthread -Wno-unused-result -finline-functions -O3 -DNDEBUG -flto -fivopts -Wno-unused-function -Wno-strict-aliasing -fno-builtin-gamma include/sam_opts.c -o include/sam_opts.o
g++-mp-6 -c -Wuninitialized -Wunreachable-code -Wall -fopenmp -DBMF_VERSION=\"v1.1-39-ga448\" -std=c++11 -fno-builtin-gamma -pedantic -Ihtslib -Iinclude -I. -lm -lz -lpthread -Wno-unused-result -finline-functions -O3 -DNDEBUG -flto -fivopts -Wno-unused-function -Wno-strict-aliasing -fno-builtin-gamma src/bmf_collapse.cpp -o src/bmf_collapse.o
gcc-mp-6 -c -Wuninitialized -Wunreachable-code -Wall -fopenmp -DBMF_VERSION=\"v1.1-39-ga448\" -std=gnu99 -fno-builtin-gamma -pedantic -Ihtslib -Iinclude -I. -lm -lz -lpthread -Wno-unused-result -finline-functions -O3 -DNDEBUG -flto -fivopts -Wno-unused-function -Wno-strict-aliasing -fno-builtin-gamma include/igamc_cephes.c -o include/igamc_cephes.o
g++-mp-6 -c -Wuninitialized -Wunreachable-code -Wall -fopenmp -DBMF_VERSION=\"v1.1-39-ga448\" -std=c++11 -fno-builtin-gamma -pedantic -Ihtslib -Iinclude -I. -lm -lz -lpthread -Wno-unused-result -finline-functions -O3 -DNDEBUG -flto -fivopts -Wno-unused-function -Wno-strict-aliasing -fno-builtin-gamma lib/hashdmp.cpp -o lib/hashdmp.o
Can you see what command it is executing before it fails?
UMItools is not designed for performance or error correction, but primarily library complexity. Additionally, BMFtools emphasizes rigorous statistics and provides a wide range of analytical functionality. Further details include working with multiple experimental designs, including inline barcodes.
the make process stop here.
**g++ -Wuninitialized -Wunreachable-code -Wall -fopenmp -DBMF_VERSION=\"v1.1-29-g0dff\" -std=c++11 -fno-builtin-gamma -pedantic -Ihtslib -Iinclude -I. -lm -lz -lpthread -Wno-unus
ed-result -Wno-unused-function -Wno-strict-aliasing -pedantic -fno-builtin-gamma -fno-inline test/ucs/ucs_test.dbo libhts.a -o test/ucs/ucs_test**
libhts.a(hts.o): In function `decompress_peek':
/share/tools/ngs-tools/BMFtools/htslib/hts.c:142: undefined reference to `inflateInit2_'
/share/tools/ngs-tools/BMFtools/htslib/hts.c:145: undefined reference to `inflate'
/share/tools/ngs-tools/BMFtools/htslib/hts.c:148: undefined reference to `inflateEnd'
/share/tools/ngs-tools/BMFtools/htslib/hts.c:148: undefined reference to `inflateEnd'
libhts.a(cram_io.o): In function `zlib_mem_deflate':
/share/tools/ngs-tools/BMFtools/htslib/cram/cram_io.c:782: undefined reference to `deflateInit2_'
/share/tools/ngs-tools/BMFtools/htslib/cram/cram_io.c:796: undefined reference to `deflate'
/share/tools/ngs-tools/BMFtools/htslib/cram/cram_io.c:803: undefined reference to `deflate'
/share/tools/ngs-tools/BMFtools/htslib/cram/cram_io.c:808: undefined reference to `deflateEnd'
libhts.a(cram_io.o): In function `itf8_decode_crc':
/share/tools/ngs-tools/BMFtools/htslib/cram/cram_io.c:216: undefined reference to `crc32'
/share/tools/ngs-tools/BMFtools/htslib/cram/cram_io.c:225: undefined reference to `crc32'
/share/tools/ngs-tools/BMFtools/htslib/cram/cram_io.c:195: undefined reference to `crc32'
/share/tools/ngs-tools/BMFtools/htslib/cram/cram_io.c:201: undefined reference to `crc32'
a weird things happened.
I try to make
the code again and again without make clean
.
The binary file, bmftools, appeared, although the error message still exist.
run ./bmftools
seems fine.
./bmftools root@94
Usage: bmftools <subcommand>. See subcommand menus for usage.
-v/--version: Print bmftools version and exit.
cap: Modifies the quality string as function of family metadata.
depth: Calculates depth of coverage over a set of bed intervals.
collapse: Collapses fastq records by barcodes.
err: Calculate error rates based on cycle, base call, and quality score.
famstats: Calculate family size statistics for a bam alignment file.
filter: Filter or split a bam file by a set of filters.
mark: Add tags including unclipped start positions.
rsq: Rescue reads with using positional inference to collapse to unique observations in spite of errors in the barcode sequence.
sort: Sort for bam rescue.
stack: A maximally-permissive yet statistically-thorough variant caller using molecular barcode metadata.
target: Calculates on-target rate.
vet: Curate variant calls from another variant caller (.bcf) and an indexed alignment file.
I probably need to update the commands for the unit tests. Thanks!
hi @dnbh Is there any quick start manual to follow?
$ ./bmftools collapse
Collapses initial fastq by exact barcode matching.
Subcommands:
inline: Inline barcoded chemistry.
secondary: Secondary barcoded chemistry.
https://github.com/ARUP-NGS/BMFtools/blob/master/MANUAL.md#exact-matching-fastq-consolidation
Start here. You collapse the fastqs using inline or secondary depending on your chemistry, then align as detailed below. You can optionally perform a rescue step for barcode errors. For variant-calling, you can preprocess with bmftools filter, curate calls by another variant caller using bmftools vet, or you can produce a pileup-like vcf using bmftools stack.
I do recommend piping the bwa output to bmftools mark whether or not you plan to perform rescue.
thank you @dnbh
hi @dnbh sorry for disturbing you. I don't know what homing sequencing mean in this program.
-s <homing_sequence>
The homing sequence is a sequence of bases marking the end of the random nucleotides which make up the barcode.
My sequence is NNNNNNNNNNNNNNNNNNNNGAGCTCA......
The sequence beside my UMI is GAGCTCA
.
So, what should I input? Just GAGCTCA
or something else?
Would you please show me a demo?
$bmftools collapse inline \
-s GAGCTCA -l 20 -o bmftemp -p 10 -f bmftest ./processed.1.fastq ./processed.2.fastq
[pp_split_inline] Collapsing 418077 initial read pairs....
[1] 32745 segmentation fault (core dumped) bmftools collapse inline
-s GAGCTCA -l 20 -o bmftemp -p 10 -f bmftest
20 seems awfully long for a -l parameter. That means that you're using the first 20 bases from each read and concatenation for them for a barcode, then masking the next 7 bases from each read. That leaves you with a 40bp barcode. You are, however, using the homing sequence correctly.
Are your reads of uniform length? If not, this can cause problems. If they are, can you upload these or a subset of the files for me to investigate?
Thank you!
The reads are uniform length, I only use single end barcode and the barcode is 20bp in length.
The unique barcode and illumina sequencing index is added by PCR amplification, and thus the barcode is exactly the first 20 base in 5' of read1.
Can BMFtools deal such design?
As it is, bmftools supports online Loeb-like inline chemistries. A patch wouldn't be hard. Alternatively, I could write a preprocessing tool which prepares your data for processing through secondary. Inline assimes duplex.
Can you send me the first 40k lines or so of each fastq?
You might be running into a problem with temporary filenames. There seems to have been a bug in random string name generation. Can you try setting the -o parameter to 'tmpfiles' to see if it eliminates the segfault?
I'll work on the patch in the morning.
Thank you very much, dnbh.
These are the files.
The reference sequence is in
gb.tar.gz
format. first 10k read pairs are infq.gz
format.
ref.gb.tar.gz yech_R2.fq.gz yech_R1.fq.gz
When I change the option of -l
to 10, it works. But I don't know what does it mean.
@dnbh It is the irregular design. I think in more case, single end barcode can exist in either read1 or read2, for most sequencing library preparation protocol use ligation method instead of PCR method.
BTW, is maskripper
used for cutting PCR primer?
It can if you've masked them using a tool like cutadapt. It can't otherwise; It just trims masked bases. You should only trim after collapse (if not rescuing) or after rescue (if performing rescue). Otherwise, it messes with a lot of the assumptions of the software.
My patch is ssi2sec, which you can make by fetching from this repository. You run it on two fastq files, specify -2 if read 2 is the one with the barcode, and it produces two fastqs for reads and a secondary index read file. You can then run the full pipeline through the secondary pipeline, where it won't try to perform duplex collapsing.
I was able to collapse your reads as follows:
ssi2sec yech_R1.fq yech_r2.fq o1.fq o2.fq oi.fq
bmftools collapse secondary -otmpfiles -fyechcol -i oi.fq o1.fq o2.fq
Use this tag.
Thank you for your help.
I am cloning the your repository now. Due to slow network speed, it may take hours to finish.
I wonder the difference between your fork and the origin repository is the binary file ssi2sec
? Can I just copy the ssi2sec.cpp and compile?
BTW, what is ssi2sec short for?
edit:
ssi2sec -l20 yech_R1.fq yech_r2.fq o1.fq o2.fq oi.fq
bmftools collapse secondary -otmpfiles -fyechcol -i oi.fq o1.fq o2.fq
Single-strand index to secondary. Makes a fake secondary index chemistry dataset, masks the appropriate portion of the read which the barcode came from, and allows you to perform your analysis there on out without a hitch. And it's simple enough code that it doesn't cost much runtime.
I also fixed the bug which caused problems with barcode lengths beyond 15 for inline chemistry and changed random string generation to be more robust, but those changes aren't essential. You could also add my fork as a remote and just pull from it, avoiding copying anything but the diffs.
It should compile if you copy and paste, though. I'd just say to be careful and provide arguments to the -o parameter during collapse.
hi Daniel,
I run ssi2sec -l20 yech_R1.fq yech_r2.fq o1.fq o2.fq oi.fq
.
2 hours passed, and the program is still running.
Oh, my mistake. I had a lowercase r in r2 in that command-line, so it just wasn't reading because of the mismatched filename. Sorry! I also found a mistake (outputting the sequence for read 1 to both handles) which I fixed; can you try pulling again?
ok, I am pulling the repository now. :)
hi Daniel, The C++ code wrote is extremely fast. I never wrote C++ before, and it urge me to learn now.
I have designed a pipeline days before, and wrote a python script to do it.
The pipeline:
1. split the raw data into different target.
2. split the UMI and index into fastq read description.
3. map the the reads to according reference and pass the TAG to sam file.
*4. calculate the consensus sequence for reads with the same barcode and write to a new bam file.
*5. call mutation form consensus bam file.
Is this a reasonable pipeline?
I have finish the first three step now. Can I use BMFtools to finish the 4th and 5th step?
python code for the 2nd step
def interleave(iter1, iter2): """read paired file together""" for (forward, reverse) in itertools.izip(iter1, iter2): assert forward.id == reverse.id description = "IX:Z:{}\tBC:Z:{}\tTG:Z:{}".format( forward.description.split(":")[9], reverse[-8:].seq, forward[:20].seq) forward.description = description reverse.description = description forward = forward[20:] reverse = reverse[:-8] yield forward, reverse
records_f = SeqIO.parse(open(file_f, "rU"), file_format) records_r = SeqIO.parse(open(file_r, "rU"), file_format)
handle_f_out = open(file_f_out, "w") handle_r_out = open(file_r_out, "w") for i in interleave(records_f, records_r): SeqIO.write(i[0], handle_f_out, file_format) SeqIO.write(i[1], handle_r_out, file_format) handle_f_out.close() handle_r_out.close()
> CLI for 3rd step
```shell
bwa mem -C -t 20 ../ref/target_in_this_experiment.fa ./test_out_R1.fq test_out_R2.fq > test.sam
sam file output in 3rd step hello.sam.gz
I figure out I am completely wrong.
I seems I should just follow this pipeline.
I have example scripts in the repo for how you would use BMFtools in a workflow, and I would have thought the manual would be sufficient instruction without them.
Their software may provide the functionality you need. BMFtools, on the other hand, has real performance, functionality, and statistical rigor advantages. It was built for a production environment and scalability, for which these Python scripts weren't.
I can also get you a snakemake workflow.
On Sun, Dec 11, 2016 at 3:04 AM yech notifications@github.com wrote:
I figure out I am completely wrong.
I seems I should just follow this pipeline http://presto.readthedocs.io/en/latest/workflows/Stern2014_Workflow.html .
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/ARUP-NGS/BMFtools/issues/93#issuecomment-266268980, or mute the thread https://github.com/notifications/unsubscribe-auth/AGHaVVRwhnTiBOptJKIH1x9K8E9Qh_cpks5rG66IgaJpZM4LFHwA .
I have a - perhaps - similar issue as @yech1990 in the beginning of the thread: BMFtools' Makefile fails with a bunch of "undefined reference to" various files (gzread, gzclose, deflateInit2_, deflate, crc32 and more....). The last three lines:
collect2: error: ld returned 1 exit status Makefile:93: recipe for target 'bmftools' failed make: *** [bmftools] Error 1
I tried the suggestions in this thread but in vain. Any ideas ?