dpryan79 / SE-MEI

Tools for finding mobile element insertions from single-end datasets
GNU General Public License v2.0
23 stars 16 forks source link

how to compile #4

Open maheydari opened 5 years ago

maheydari commented 5 years ago

could you please write a minimal code to compile this program? Actually, the reason I am looking at it is I am looking for an example to use samtools C-API . I saw some codes from you using this API, but I couldn't compile them.

dpryan79 commented 5 years ago
git clone --recurse-submodules https://github.com/dpryan79/SE-MEI
cd SE-MEI
make

Or something along those lines.

maheydari commented 5 years ago

Thanks for your answer I could successfully compile the code. But you have provided htslib folder along with the project which is the older version. I downloaded the latest version of it and compiled and replaced it with your older version. Now Again I want to compile your code and it gives me these types of errors:

/home/mahdi/SE-MEI2/htslib/hfile_libcurl.c:1138: undefined reference to `curl_easy_setopt'
/home/mahdi/SE-MEI2/htslib/hfile_libcurl.c:1124: undefined reference to `curl_multi_init'
/home/mahdi/SE-MEI2/htslib/hfile_libcurl.c:1127: undefined reference to `curl_easy_init'
/home/mahdi/SE-MEI2/htslib/cram/cram_io.c:966: undefined reference to `BZ2_bzBuffToBuffDecompress'

Could you please help me understand what I did wrong probably? Can I laster install hstlib in in the subdirectory but in my home and link it to this program?

dpryan79 commented 5 years ago

Don't download the latest version. This is an old project, it was written for the old htslib version it comes with.

maheydari commented 5 years ago

Thank you very much. Your comments are always helpful especially in the biostars community. I want to write a very simple code to read a bam file using the latest version of hstlib. I wanted to get some idea from your code to see how can I do that. If you have any idea please let me know.

dpryan79 commented 5 years ago

Ah, well you can get the gist of things from my code. The htslib API hasn't really changed that drastically since I wrote this.

maheydari commented 5 years ago

Indeed it's very useful. I am looking for a functionality similar to seekg or tellg while reading bam file. I don't want to parse the reads from the beginning, instead, I want to randomly go through it and parse some of them. I couldn't find it in this project, don't you have any similar experience?

dpryan79 commented 5 years ago

You can seek to arbitrary blocks, likely using non-exported functions from htslib, but you can't just say "seek to read 123456", since the files aren't structured in a way to make that practical. I suggest you either use reservoir sampling or read the header and then sample random intervals. The files are structured such that it's quick to get reads overlapping a given interval. We use that methodology in deepTools when we need to randomly sample reads.

maheydari commented 5 years ago

I ended up with a solution yesterday which is probably also what you are suggesting to do, Actually, I was looking at this project https://github.com/hasindu2008/simple_bam_parser/blob/master/randomacess.c I only changed the given interval which was based on a string to two int numbers to specify the start and end ( are the start and end , actually the first and the last read in the chunk or i missunderstood? ). Here is my code (not the whole but the related part):

    hts_idx_t *idx=NULL; 
    samFile *in = NULL;
    hts_itr_t *iter=NULL;
    in = sam_open(argv[1], "r");
    idx = sam_index_load(in, argv[1]);
    iter  = sam_itr_queryi(idx, 0, 10, 100); 

     b = bam_init1();
     while ( sam_itr_next(in, iter, b) >= 0){
            cout <<b->core.pos <<endl;
     }

Is that correct? and the question is does sam_index_load function loads the whole bam file into memory? I see a sudden jump in memory usage when I call it.

I already appreciate you dedicate your time to answer my random questions.

Update : I found your answer here to a similar question which is also very similar to the above project : https://www.biostars.org/p/151053/

dpryan79 commented 5 years ago

It only loads the index into memory, which is rather small. The memory jump likely has more to do with how may reads are in the bgzip blocks, since you have to load one of those into memory to decompress it.

markziemann commented 4 years ago

Hi @dpryan79, I'm using this as a dependancy of ERVtools and ran into problems with compilation too. Eventually found a solution, but it took a while. htslib 1.1 works fine with this package https://github.com/samtools/htslib/releases/tag/1.1 Could you include this in the readme.md? Thanks

dpryan79 commented 4 years ago

@markziemann I've mentioned that in the readme just now, the submodule itself should actually be using 1.1 already, though I guess if you update it then that'd cause an issue. If you end up needing a newer htslib then let me know and I'll try to update this (I haven't really used this in a number of years).