ARUP-NGS / BMFtools

Barcoded Molecular Families
MIT License
22 stars 8 forks source link

make failed #93

Closed y9c closed 7 years ago

y9c commented 7 years ago
...
libhts.a(bgzf.o): In function `inflate_block':
/share/tools/ngs-tools/BMFtools/htslib/bgzf.c:302: undefined reference to `inflateInit2_'
/share/tools/ngs-tools/BMFtools/htslib/bgzf.c:306: undefined reference to `inflate'
/share/tools/ngs-tools/BMFtools/htslib/bgzf.c:311: undefined reference to `inflateEnd'
libhts.a(bgzf.o): In function `bgzf_read_block':
/share/tools/ngs-tools/BMFtools/htslib/bgzf.c:505: undefined reference to `inflateInit2_'
libhts.a(bgzf.o): In function `inflate_block':
/share/tools/ngs-tools/BMFtools/htslib/bgzf.c:307: undefined reference to `inflateEnd'
libhts.a(bgzf.o): In function `bgzf_close':
/share/tools/ngs-tools/BMFtools/htslib/bgzf.c:816: undefined reference to `inflateEnd'
/share/tools/ngs-tools/BMFtools/htslib/bgzf.c:817: undefined reference to `deflateEnd'
collect2: error: ld returned 1 exit status
Makefile:93: recipe for target 'bmftools' failed
make: *** [bmftools] Error 1
y9c commented 7 years ago
uname -a
Linux 94 4.4.0-51-generic 
#72-Ubuntu SMP Thu Nov 24 18:29:54 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
dnbaker commented 7 years ago

What's your commit tag, and what is the command-line that is failing? It seems to be failing to load zlib.

y9c commented 7 years ago

@dnbh I just run make after git clone

y9c commented 7 years ago

the full log

HEAD is now at f90ae8b... Add zr2
/tmp/cchPfbS8.ltrans0.ltrans.o: In function `kseq_read(kseq_t*) [clone .lto_priv.291] [clone .lto_priv.295]':
<artificial>:(.text+0xaf9): undefined reference to `gzread'
<artificial>:(.text+0xcae): undefined reference to `gzread'
<artificial>:(.text+0xd84): undefined reference to `gzread'
<artificial>:(.text+0xeb2): undefined reference to `gzread'
<artificial>:(.text+0xf04): undefined reference to `gzread'
/tmp/cchPfbS8.ltrans0.ltrans.o: In function `bmf::sdmp_main(int, char**) [clone .part.13]':
<artificial>:(.text+0x27d8): undefined reference to `gzopen'
<artificial>:(.text+0x27f2): undefined reference to `gzopen'
<artificial>:(.text+0x2845): undefined reference to `gzopen'
<artificial>:(.text+0x2a6d): undefined reference to `gzopen'
<artificial>:(.text+0x2a87): undefined reference to `gzopen'
<artificial>:(.text+0x2c77): undefined reference to `gzprintf'
<artificial>:(.text+0x2e02): undefined reference to `gzprintf'
<artificial>:(.text+0x2e38): undefined reference to `gzclose'
<artificial>:(.text+0x2e42): undefined reference to `gzclose'
<artificial>:(.text+0x2e5d): undefined reference to `gzclose'
<artificial>:(.text+0x303e): undefined reference to `gzprintf'
<artificial>:(.text+0x308a): undefined reference to `gzprintf'
<artificial>:(.text+0x3291): undefined reference to `gzprintf'
<artificial>:(.text+0x32c3): undefined reference to `gzprintf'
<artificial>:(.text+0x330d): undefined reference to `gzclose'
<artificial>:(.text+0x3317): undefined reference to `gzclose'
<artificial>:(.text+0x3321): undefined reference to `gzclose'
<artificial>:(.text+0x3343): undefined reference to `gzclose'
<artificial>:(.text+0x3354): undefined reference to `gzclose'
/tmp/cchPfbS8.ltrans0.ltrans.o: In function `bmf::idmp_main(int, char**)':
<artificial>:(.text+0x3afd): undefined reference to `gzopen'
<artificial>:(.text+0x3e19): undefined reference to `gzprintf'
<artificial>:(.text+0x402e): undefined reference to `gzprintf'
<artificial>:(.text+0x4084): undefined reference to `gzclose'
<artificial>:(.text+0x40b4): undefined reference to `gzclose'
<artificial>:(.text+0x4459): undefined reference to `gzopen'
<artificial>:(.text+0x4473): undefined reference to `gzopen'
<artificial>:(.text+0x47e2): undefined reference to `gzprintf'
<artificial>:(.text+0x482f): undefined reference to `gzprintf'
<artificial>:(.text+0x4c82): undefined reference to `gzprintf'
<artificial>:(.text+0x4cb6): undefined reference to `gzprintf'
<artificial>:(.text+0x4e02): undefined reference to `gzprintf'
/tmp/cchPfbS8.ltrans0.ltrans.o:<artificial>:(.text+0x4e35): more undefined references to `gzprintf' follow
/tmp/cchPfbS8.ltrans0.ltrans.o: In function `bmf::idmp_main(int, char**)':
<artificial>:(.text+0x4ea0): undefined reference to `gzclose'
<artificial>:(.text+0x4eb5): undefined reference to `gzclose'
<artificial>:(.text+0x4ef5): undefined reference to `gzclose'
<artificial>:(.text+0x4eff): undefined reference to `gzclose'
<artificial>:(.text+0x5005): undefined reference to `gzprintf'
/tmp/cchPfbS8.ltrans1.ltrans.o: In function `bmf::hash_inmem_inline_core(char*, char*, char*, char*, char*, int, int, int, int, int)':
<artificial>:(.text+0x162): undefined reference to `gzdopen'
<artificial>:(.text+0x186): undefined reference to `gzdopen'
<artificial>:(.text+0xc3d): undefined reference to `gzclose'
<artificial>:(.text+0xc4a): undefined reference to `gzclose'
<artificial>:(.text+0x271b): undefined reference to `gzputs'
<artificial>:(.text+0x273c): undefined reference to `gzputs'
<artificial>:(.text+0x3aa5): undefined reference to `gzputs'
<artificial>:(.text+0x3aff): undefined reference to `gzputs'
<artificial>:(.text+0x3bfa): undefined reference to `gzclose'
<artificial>:(.text+0x3c07): undefined reference to `gzclose'
<artificial>:(.text+0x415e): undefined reference to `gzclose'
<artificial>:(.text+0x41b3): undefined reference to `gzclose'
/tmp/cchPfbS8.ltrans2.ltrans.o: In function `bmf::depth_main(int, char**)':
<artificial>:(.text+0x418): undefined reference to `gzopen'
<artificial>:(.text+0xd67): undefined reference to `gzclose'
/tmp/cchPfbS8.ltrans6.ltrans.o: In function `bmf::stranded_hash_dmp_core(char*, char*, int)':
<artificial>:(.text+0x69): undefined reference to `gzopen'
<artificial>:(.text+0x8d): undefined reference to `gzopen'
<artificial>:(.text+0x1bb9): undefined reference to `gzputs'
<artificial>:(.text+0x1d87): undefined reference to `gzputs'
<artificial>:(.text+0x1e98): undefined reference to `gzclose'
<artificial>:(.text+0x1ea2): undefined reference to `gzclose'
<artificial>:(.text+0x249e): undefined reference to `gzclose'
<artificial>:(.text+0x24a8): undefined reference to `gzclose'
/tmp/cchPfbS8.ltrans12.ltrans.o: In function `bmf::init_splitter(bmf::marksplit_settings_t*)':
<artificial>:(.text+0x1ef): undefined reference to `gzopen'
<artificial>:(.text+0x205): undefined reference to `gzopen'
<artificial>:(.text+0x33e): undefined reference to `gzopen'
/tmp/cchPfbS8.ltrans14.ltrans.o: In function `bmf::hash_dmp_core(char*, char*, int)':
<artificial>:(.text+0x98): undefined reference to `gzopen'
<artificial>:(.text+0xba): undefined reference to `gzdopen'
<artificial>:(.text+0x1654): undefined reference to `gzputs'
<artificial>:(.text+0x19d8): undefined reference to `gzclose'
<artificial>:(.text+0x19e2): undefined reference to `gzclose'
<artificial>:(.text+0x1adb): undefined reference to `gzclose'
<artificial>:(.text+0x1b12): undefined reference to `gzclose'
<artificial>:(.text+0x1b24): undefined reference to `gzclose'
/tmp/cchPfbS8.ltrans17.ltrans.o: In function `dlib::open_gzfile(char*, char const*)':
<artificial>:(.text+0x4c3): undefined reference to `gzdopen'
<artificial>:(.text+0x4d1): undefined reference to `gzopen'
/tmp/cchPfbS8.ltrans20.ltrans.o: In function `ks_getuntil2(__kstream_t*, int, __kstring_t*, int*, int) [clone .constprop.79]':
<artificial>:(.text+0xf4): undefined reference to `gzread'
/tmp/cchPfbS8.ltrans21.ltrans.o: In function `dlib::parse_bed_hash(char const*, bam_hdr_t*, unsigned int)':
<artificial>:(.text+0x90c): undefined reference to `gzopen'
<artificial>:(.text+0x94c): undefined reference to `gzgets'
<artificial>:(.text+0xa53): undefined reference to `gzclose'
libhts.a(hts.o): In function `decompress_peek':
/share/tools/ngs-tools/BMFtools/htslib/hts.c:142: undefined reference to `inflateInit2_'
/share/tools/ngs-tools/BMFtools/htslib/hts.c:145: undefined reference to `inflate'
/share/tools/ngs-tools/BMFtools/htslib/hts.c:148: undefined reference to `inflateEnd'
/share/tools/ngs-tools/BMFtools/htslib/hts.c:148: undefined reference to `inflateEnd'
libhts.a(vcf.o): In function `vcf_hdr_read':
/share/tools/ngs-tools/BMFtools/htslib/vcf.c:1288: undefined reference to `gzopen'
libhts.a(vcf.o): In function `ks_getc':
/share/tools/ngs-tools/BMFtools/htslib/vcf.c:49: undefined reference to `gzread'
libhts.a(vcf.o): In function `vcf_hdr_read':
/share/tools/ngs-tools/BMFtools/htslib/vcf.c:1301: undefined reference to `gzclose'
libhts.a(cram_io.o): In function `zlib_mem_deflate':
/share/tools/ngs-tools/BMFtools/htslib/cram/cram_io.c:782: undefined reference to `deflateInit2_'
/share/tools/ngs-tools/BMFtools/htslib/cram/cram_io.c:796: undefined reference to `deflate'
/share/tools/ngs-tools/BMFtools/htslib/cram/cram_io.c:803: undefined reference to `deflate'
/share/tools/ngs-tools/BMFtools/htslib/cram/cram_io.c:808: undefined reference to `deflateEnd'
libhts.a(cram_io.o): In function `itf8_decode_crc':
/share/tools/ngs-tools/BMFtools/htslib/cram/cram_io.c:216: undefined reference to `crc32'
/share/tools/ngs-tools/BMFtools/htslib/cram/cram_io.c:225: undefined reference to `crc32'
/share/tools/ngs-tools/BMFtools/htslib/cram/cram_io.c:195: undefined reference to `crc32'
/share/tools/ngs-tools/BMFtools/htslib/cram/cram_io.c:201: undefined reference to `crc32'
/share/tools/ngs-tools/BMFtools/htslib/cram/cram_io.c:208: undefined reference to `crc32'
libhts.a(cram_io.o):/share/tools/ngs-tools/BMFtools/htslib/cram/cram_io.c:543: more undefined references to `crc32' follow
libhts.a(cram_io.o): In function `zlib_mem_inflate':
/share/tools/ngs-tools/BMFtools/htslib/cram/cram_io.c:718: undefined reference to `inflateInit2_'
/share/tools/ngs-tools/BMFtools/htslib/cram/cram_io.c:731: undefined reference to `inflate'
/share/tools/ngs-tools/BMFtools/htslib/cram/cram_io.c:751: undefined reference to `inflateEnd'
libhts.a(cram_io.o): In function `cram_read_block':
/share/tools/ngs-tools/BMFtools/htslib/cram/cram_io.c:946: undefined reference to `crc32'
/share/tools/ngs-tools/BMFtools/htslib/cram/cram_io.c:948: undefined reference to `crc32'
/share/tools/ngs-tools/BMFtools/htslib/cram/cram_io.c:980: undefined reference to `crc32'
libhts.a(cram_io.o): In function `cram_write_block':
/share/tools/ngs-tools/BMFtools/htslib/cram/cram_io.c:1049: undefined reference to `crc32'
/share/tools/ngs-tools/BMFtools/htslib/cram/cram_io.c:1054: undefined reference to `crc32'
libhts.a(cram_io.o):/share/tools/ngs-tools/BMFtools/htslib/cram/cram_io.c:3092: more undefined references to `crc32' follow
libhts.a(zfio.o): In function `zfputs':
/share/tools/ngs-tools/BMFtools/htslib/cram/zfio.c:80: undefined reference to `gzputs'
libhts.a(zfio.o): In function `zfpeek':
/share/tools/ngs-tools/BMFtools/htslib/cram/zfio.c:97: undefined reference to `gzungetc'
/share/tools/ngs-tools/BMFtools/htslib/cram/zfio.c:95: undefined reference to `gzgetc'
libhts.a(zfio.o): In function `zfopen':
/share/tools/ngs-tools/BMFtools/htslib/cram/zfio.c:165: undefined reference to `gzopen'
/share/tools/ngs-tools/BMFtools/htslib/cram/zfio.c:169: undefined reference to `gzopen'
libhts.a(zfio.o): In function `zfclose':
/share/tools/ngs-tools/BMFtools/htslib/cram/zfio.c:180: undefined reference to `gzclose'
libhts.a(zfio.o): In function `zfgets':
/share/tools/ngs-tools/BMFtools/htslib/cram/zfio.c:69: undefined reference to `gzgets'
libhts.a(zfio.o): In function `zfeof':
/share/tools/ngs-tools/BMFtools/htslib/cram/zfio.c:105: undefined reference to `gzeof'
libhts.a(bgzf.o): In function `bgzf_write_init':
/share/tools/ngs-tools/BMFtools/htslib/bgzf.c:166: undefined reference to `deflateInit2_'
libhts.a(bgzf.o): In function `inflate_gzip_block':
/share/tools/ngs-tools/BMFtools/htslib/bgzf.c:335: undefined reference to `inflate'
libhts.a(bgzf.o): In function `bgzf_open':
/share/tools/ngs-tools/BMFtools/htslib/bgzf.c:174: undefined reference to `compressBound'
libhts.a(bgzf.o): In function `bgzf_dopen':
/share/tools/ngs-tools/BMFtools/htslib/bgzf.c:196: undefined reference to `compressBound'
libhts.a(bgzf.o): In function `bgzf_hopen':
/share/tools/ngs-tools/BMFtools/htslib/bgzf.c:218: undefined reference to `compressBound'
libhts.a(bgzf.o): In function `bgzf_compress':
/share/tools/ngs-tools/BMFtools/htslib/bgzf.c:244: undefined reference to `deflateInit2_'
/share/tools/ngs-tools/BMFtools/htslib/bgzf.c:245: undefined reference to `deflate'
/share/tools/ngs-tools/BMFtools/htslib/bgzf.c:246: undefined reference to `deflateEnd'
/share/tools/ngs-tools/BMFtools/htslib/bgzf.c:252: undefined reference to `crc32'
/share/tools/ngs-tools/BMFtools/htslib/bgzf.c:252: undefined reference to `crc32'
libhts.a(bgzf.o): In function `bgzf_gzip_compress':
/share/tools/ngs-tools/BMFtools/htslib/bgzf.c:267: undefined reference to `deflate'
libhts.a(bgzf.o): In function `inflate_block':
/share/tools/ngs-tools/BMFtools/htslib/bgzf.c:302: undefined reference to `inflateInit2_'
/share/tools/ngs-tools/BMFtools/htslib/bgzf.c:306: undefined reference to `inflate'
/share/tools/ngs-tools/BMFtools/htslib/bgzf.c:311: undefined reference to `inflateEnd'
libhts.a(bgzf.o): In function `bgzf_read_block':
/share/tools/ngs-tools/BMFtools/htslib/bgzf.c:505: undefined reference to `inflateInit2_'
libhts.a(bgzf.o): In function `inflate_block':
/share/tools/ngs-tools/BMFtools/htslib/bgzf.c:307: undefined reference to `inflateEnd'
libhts.a(bgzf.o): In function `bgzf_close':
/share/tools/ngs-tools/BMFtools/htslib/bgzf.c:816: undefined reference to `inflateEnd'
/share/tools/ngs-tools/BMFtools/htslib/bgzf.c:817: undefined reference to `deflateEnd'
collect2: error: ld returned 1 exit status
make: *** [bmftools] Error 1
dnbaker commented 7 years ago

This problem was likely fixed in some recent changes in the Makefile. Can you switch to branch dev and rebuild?

Thanks for reporting!

y9c commented 7 years ago

@dnbh

$ git checkout dev
$ make clean
$ make
collect2: error: ld returned 1 exit status
Makefile:98: recipe for target 'test/ucs/ucs_test' failed
make: *** [test/ucs/ucs_test] Error 1
make: *** Waiting for unfinished jobs....
dnbaker commented 7 years ago

It seems like your make isn't being helpful. Let's make it emit the commands that it's running before it runs them:

make clean && make -j1 SHELL="bash -x"

This should force it to emit the commands it's calling. The -lz flag could be in the wrong spot.

Alternatively, your zlib.so might not be in your LD_LIBRARY_PATH.

y9c commented 7 years ago

I have zlib in library path.

after run make clean && make -j1 SHELL="bash -x", the error remains

I try the command above in both zsh and bash shell.

y9c commented 7 years ago

What is the difference between BMFtools and UMI-tools? https://github.com/CGATOxford/UMI-tools

dnbaker commented 7 years ago

When I run make, it gives me a list of commands it is executing. Does yours not?

For example, at the start,

gcc-mp-6 -c -Wuninitialized -Wunreachable-code -Wall -fopenmp -DBMF_VERSION=\"v1.1-39-ga448\" -std=gnu99 -fno-builtin-gamma -pedantic -Ihtslib -Iinclude -I.  -lm -lz -lpthread -Wno-unused-result -finline-functions -O3 -DNDEBUG -flto -fivopts -Wno-unused-function -Wno-strict-aliasing -fno-builtin-gamma include/sam_opts.c -o include/sam_opts.o
g++-mp-6 -c -Wuninitialized -Wunreachable-code -Wall -fopenmp -DBMF_VERSION=\"v1.1-39-ga448\" -std=c++11 -fno-builtin-gamma -pedantic   -Ihtslib -Iinclude -I.  -lm -lz -lpthread -Wno-unused-result -finline-functions -O3 -DNDEBUG -flto -fivopts -Wno-unused-function -Wno-strict-aliasing -fno-builtin-gamma src/bmf_collapse.cpp -o src/bmf_collapse.o
gcc-mp-6 -c -Wuninitialized -Wunreachable-code -Wall -fopenmp -DBMF_VERSION=\"v1.1-39-ga448\" -std=gnu99 -fno-builtin-gamma -pedantic -Ihtslib -Iinclude -I.  -lm -lz -lpthread -Wno-unused-result -finline-functions -O3 -DNDEBUG -flto -fivopts -Wno-unused-function -Wno-strict-aliasing -fno-builtin-gamma include/igamc_cephes.c -o include/igamc_cephes.o
g++-mp-6 -c -Wuninitialized -Wunreachable-code -Wall -fopenmp -DBMF_VERSION=\"v1.1-39-ga448\" -std=c++11 -fno-builtin-gamma -pedantic   -Ihtslib -Iinclude -I.  -lm -lz -lpthread -Wno-unused-result -finline-functions -O3 -DNDEBUG -flto -fivopts -Wno-unused-function -Wno-strict-aliasing -fno-builtin-gamma lib/hashdmp.cpp -o lib/hashdmp.o

Can you see what command it is executing before it fails?

UMItools is not designed for performance or error correction, but primarily library complexity. Additionally, BMFtools emphasizes rigorous statistics and provides a wide range of analytical functionality. Further details include working with multiple experimental designs, including inline barcodes.

y9c commented 7 years ago

the make process stop here.

**g++ -Wuninitialized -Wunreachable-code -Wall -fopenmp -DBMF_VERSION=\"v1.1-29-g0dff\" -std=c++11 -fno-builtin-gamma -pedantic   -Ihtslib -Iinclude -I.  -lm -lz -lpthread -Wno-unus
ed-result -Wno-unused-function -Wno-strict-aliasing -pedantic -fno-builtin-gamma -fno-inline test/ucs/ucs_test.dbo libhts.a -o test/ucs/ucs_test**
libhts.a(hts.o): In function `decompress_peek':
/share/tools/ngs-tools/BMFtools/htslib/hts.c:142: undefined reference to `inflateInit2_'
/share/tools/ngs-tools/BMFtools/htslib/hts.c:145: undefined reference to `inflate'
/share/tools/ngs-tools/BMFtools/htslib/hts.c:148: undefined reference to `inflateEnd'
/share/tools/ngs-tools/BMFtools/htslib/hts.c:148: undefined reference to `inflateEnd'
libhts.a(cram_io.o): In function `zlib_mem_deflate':
/share/tools/ngs-tools/BMFtools/htslib/cram/cram_io.c:782: undefined reference to `deflateInit2_'
/share/tools/ngs-tools/BMFtools/htslib/cram/cram_io.c:796: undefined reference to `deflate'
/share/tools/ngs-tools/BMFtools/htslib/cram/cram_io.c:803: undefined reference to `deflate'
/share/tools/ngs-tools/BMFtools/htslib/cram/cram_io.c:808: undefined reference to `deflateEnd'
libhts.a(cram_io.o): In function `itf8_decode_crc':
/share/tools/ngs-tools/BMFtools/htslib/cram/cram_io.c:216: undefined reference to `crc32'
/share/tools/ngs-tools/BMFtools/htslib/cram/cram_io.c:225: undefined reference to `crc32'
/share/tools/ngs-tools/BMFtools/htslib/cram/cram_io.c:195: undefined reference to `crc32'
/share/tools/ngs-tools/BMFtools/htslib/cram/cram_io.c:201: undefined reference to `crc32'
y9c commented 7 years ago

a weird things happened.

I try to make the code again and again without make clean. The binary file, bmftools, appeared, although the error message still exist.


run ./bmftools seems fine.

./bmftools                                                                                                                  root@94
Usage: bmftools <subcommand>. See subcommand menus for usage.
-v/--version:            Print bmftools version and exit.
cap:                     Modifies the quality string as function of family metadata.
depth:                   Calculates depth of coverage over a set of bed intervals.
collapse:                Collapses fastq records by barcodes.
err:                     Calculate error rates based on cycle, base call, and quality score.
famstats:                Calculate family size statistics for a bam alignment file.
filter:                  Filter or split a bam file by a set of filters.
mark:                    Add tags including unclipped start positions.
rsq:                     Rescue reads with using positional inference to collapse to unique observations in spite of errors in the barcode sequence.
sort:                    Sort for bam rescue.
stack:                   A maximally-permissive yet statistically-thorough variant caller using molecular barcode metadata.
target:                  Calculates on-target rate.
vet:                     Curate variant calls from another variant caller (.bcf) and an indexed alignment file.
dnbaker commented 7 years ago

I probably need to update the commands for the unit tests. Thanks!

y9c commented 7 years ago

hi @dnbh Is there any quick start manual to follow?


 $ ./bmftools collapse      
Collapses initial fastq by exact barcode matching.
Subcommands:
inline: Inline barcoded chemistry.
secondary: Secondary barcoded chemistry.
dnbaker commented 7 years ago

https://github.com/ARUP-NGS/BMFtools/blob/master/MANUAL.md#exact-matching-fastq-consolidation

Start here. You collapse the fastqs using inline or secondary depending on your chemistry, then align as detailed below. You can optionally perform a rescue step for barcode errors. For variant-calling, you can preprocess with bmftools filter, curate calls by another variant caller using bmftools vet, or you can produce a pileup-like vcf using bmftools stack.

I do recommend piping the bwa output to bmftools mark whether or not you plan to perform rescue.

y9c commented 7 years ago

thank you @dnbh

y9c commented 7 years ago

hi @dnbh sorry for disturbing you. I don't know what homing sequencing mean in this program.

-s <homing_sequence>

The homing sequence is a sequence of bases marking the end of the random nucleotides which make up the barcode.

My sequence is NNNNNNNNNNNNNNNNNNNNGAGCTCA...... The sequence beside my UMI is GAGCTCA. So, what should I input? Just GAGCTCA or something else? Would you please show me a demo?

y9c commented 7 years ago
$bmftools collapse inline \
-s GAGCTCA -l 20 -o bmftemp -p 10 -f bmftest ./processed.1.fastq ./processed.2.fastq

[pp_split_inline] Collapsing 418077 initial read pairs....
[1]    32745 segmentation fault (core dumped)  bmftools collapse inline 
-s GAGCTCA -l 20 -o bmftemp -p 10 -f bmftest
dnbaker commented 7 years ago

20 seems awfully long for a -l parameter. That means that you're using the first 20 bases from each read and concatenation for them for a barcode, then masking the next 7 bases from each read. That leaves you with a 40bp barcode. You are, however, using the homing sequence correctly.

Are your reads of uniform length? If not, this can cause problems. If they are, can you upload these or a subset of the files for me to investigate?

Thank you!

y9c commented 7 years ago

The reads are uniform length, I only use single end barcode and the barcode is 20bp in length.

y9c commented 7 years ago

The unique barcode and illumina sequencing index is added by PCR amplification, and thus the barcode is exactly the first 20 base in 5' of read1.

Can BMFtools deal such design?

image

dnbaker commented 7 years ago

As it is, bmftools supports online Loeb-like inline chemistries. A patch wouldn't be hard. Alternatively, I could write a preprocessing tool which prepares your data for processing through secondary. Inline assimes duplex.

Can you send me the first 40k lines or so of each fastq?

dnbaker commented 7 years ago

You might be running into a problem with temporary filenames. There seems to have been a bug in random string name generation. Can you try setting the -o parameter to 'tmpfiles' to see if it eliminates the segfault?

I'll work on the patch in the morning.

y9c commented 7 years ago

Thank you very much, dnbh.

These are the files.

The reference sequence is in gb.tar.gz format. first 10k read pairs are in fq.gz format.

image

ref.gb.tar.gz yech_R2.fq.gz yech_R1.fq.gz


When I change the option of -l to 10, it works. But I don't know what does it mean.

y9c commented 7 years ago

@dnbh It is the irregular design. I think in more case, single end barcode can exist in either read1 or read2, for most sequencing library preparation protocol use ligation method instead of PCR method.

y9c commented 7 years ago

BTW, is maskripper used for cutting PCR primer?

dnbaker commented 7 years ago

It can if you've masked them using a tool like cutadapt. It can't otherwise; It just trims masked bases. You should only trim after collapse (if not rescuing) or after rescue (if performing rescue). Otherwise, it messes with a lot of the assumptions of the software.

dnbaker commented 7 years ago

My patch is ssi2sec, which you can make by fetching from this repository. You run it on two fastq files, specify -2 if read 2 is the one with the barcode, and it produces two fastqs for reads and a secondary index read file. You can then run the full pipeline through the secondary pipeline, where it won't try to perform duplex collapsing.

dnbaker commented 7 years ago

I was able to collapse your reads as follows:

ssi2sec yech_R1.fq yech_r2.fq o1.fq o2.fq oi.fq

bmftools collapse secondary -otmpfiles -fyechcol -i oi.fq o1.fq o2.fq

Use this tag.

y9c commented 7 years ago

Thank you for your help. I am cloning the your repository now. Due to slow network speed, it may take hours to finish. I wonder the difference between your fork and the origin repository is the binary file ssi2sec? Can I just copy the ssi2sec.cpp and compile? BTW, what is ssi2sec short for?

y9c commented 7 years ago

edit:

ssi2sec  -l20  yech_R1.fq yech_r2.fq o1.fq o2.fq oi.fq
bmftools collapse secondary -otmpfiles -fyechcol -i oi.fq o1.fq o2.fq
dnbaker commented 7 years ago

Single-strand index to secondary. Makes a fake secondary index chemistry dataset, masks the appropriate portion of the read which the barcode came from, and allows you to perform your analysis there on out without a hitch. And it's simple enough code that it doesn't cost much runtime.

I also fixed the bug which caused problems with barcode lengths beyond 15 for inline chemistry and changed random string generation to be more robust, but those changes aren't essential. You could also add my fork as a remote and just pull from it, avoiding copying anything but the diffs.

dnbaker commented 7 years ago

It should compile if you copy and paste, though. I'd just say to be careful and provide arguments to the -o parameter during collapse.

y9c commented 7 years ago

hi Daniel, I run ssi2sec -l20 yech_R1.fq yech_r2.fq o1.fq o2.fq oi.fq. 2 hours passed, and the program is still running.

dnbaker commented 7 years ago

Oh, my mistake. I had a lowercase r in r2 in that command-line, so it just wasn't reading because of the mismatched filename. Sorry! I also found a mistake (outputting the sequence for read 1 to both handles) which I fixed; can you try pulling again?

y9c commented 7 years ago

ok, I am pulling the repository now. :)

y9c commented 7 years ago

hi Daniel, The C++ code wrote is extremely fast. I never wrote C++ before, and it urge me to learn now.

I have designed a pipeline days before, and wrote a python script to do it.

The pipeline:

1. split the raw data into different target.
2. split the UMI and index into fastq read description.
3. map the the reads to according reference and pass the TAG to sam file.
*4. calculate the  consensus  sequence for reads with the same barcode and write to a new bam file.
*5. call mutation form consensus bam file.

Is this a reasonable pipeline?
I have finish the first three step now. Can I use BMFtools to finish the 4th and 5th step?


python code for the 2nd step


def interleave(iter1, iter2):
"""read paired file together"""
for (forward, reverse) in itertools.izip(iter1, iter2):
assert forward.id == reverse.id
description = "IX:Z:{}\tBC:Z:{}\tTG:Z:{}".format(
forward.description.split(":")[9],
reverse[-8:].seq,
forward[:20].seq)
forward.description = description
reverse.description = description
forward = forward[20:]
reverse = reverse[:-8]
yield forward, reverse

records_f = SeqIO.parse(open(file_f, "rU"), file_format) records_r = SeqIO.parse(open(file_r, "rU"), file_format)

handle_f_out = open(file_f_out, "w") handle_r_out = open(file_r_out, "w") for i in interleave(records_f, records_r): SeqIO.write(i[0], handle_f_out, file_format) SeqIO.write(i[1], handle_r_out, file_format) handle_f_out.close() handle_r_out.close()


> CLI for 3rd step
```shell
bwa mem -C -t 20 ../ref/target_in_this_experiment.fa ./test_out_R1.fq test_out_R2.fq > test.sam

sam file output in 3rd step hello.sam.gz

y9c commented 7 years ago

I figure out I am completely wrong.

I seems I should just follow this pipeline.

dnbaker commented 7 years ago

I have example scripts in the repo for how you would use BMFtools in a workflow, and I would have thought the manual would be sufficient instruction without them.

Their software may provide the functionality you need. BMFtools, on the other hand, has real performance, functionality, and statistical rigor advantages. It was built for a production environment and scalability, for which these Python scripts weren't.

I can also get you a snakemake workflow.

On Sun, Dec 11, 2016 at 3:04 AM yech notifications@github.com wrote:

I figure out I am completely wrong.

I seems I should just follow this pipeline http://presto.readthedocs.io/en/latest/workflows/Stern2014_Workflow.html .

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/ARUP-NGS/BMFtools/issues/93#issuecomment-266268980, or mute the thread https://github.com/notifications/unsubscribe-auth/AGHaVVRwhnTiBOptJKIH1x9K8E9Qh_cpks5rG66IgaJpZM4LFHwA .

madsheilskov commented 7 years ago

I have a - perhaps - similar issue as @yech1990 in the beginning of the thread: BMFtools' Makefile fails with a bunch of "undefined reference to" various files (gzread, gzclose, deflateInit2_, deflate, crc32 and more....). The last three lines:

collect2: error: ld returned 1 exit status Makefile:93: recipe for target 'bmftools' failed make: *** [bmftools] Error 1

I tried the suggestions in this thread but in vain. Any ideas ?

dnbaker commented 7 years ago

Try pulling instead from here on my personal fork, which I've been maintaining.