duncanca / mosaik-aligner

Automatically exported from code.google.com/p/mosaik-aligner
0 stars 0 forks source link

Integer owerflow #82

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
1.
Download Homo_sapiens.GRCh37.60.dna.toplevel.fa.gz from 
ftp://ftp.ensembl.org/pub/current/fasta/homo_sapiens/dna/
2.
MosaikBuild -fr Homo_sapiens.GRCh37.60.dna.toplevel.fa.gz -oa 
Homo_sapiens.GRCh37.60.dna.toplevel.dat

If the problems I reported previously will be solved, there is another serious 
bug. 

void CMosaikBuild::CreateReferenceArchive(const string& fastaFilename, const 
string& archiveFilename) {

    // initialize
    vector<ReferenceSequence> references;
    unsigned int concatenatedLength = 0;

When the total number of bases is bigger than UINT_MAX which is the case at 
Homo Sapiens genome, program will fail on memcpy because of

concatenatedReference = new char[concatenatedLength + 1]; allocates 
unsufficient memory. 
(Me personally dont understand why +1 is present)

It would be nice not to copy anything to such structures and only work with the 
hardly allocated references structure itself. 

I would do this and post you source code but I dont know precise dat file 
specification and it would probably be contraproductive. But if you point me to 
the right direction, I will do the bugfix. 

Original issue reported on code.google.com by kulv...@gmail.com on 25 Nov 2010 at 3:12

GoogleCodeExporter commented 8 years ago
Hi,

+1 is for appending a null character (\0).

This's a cruel case for MOSAIK. I'd say using larger integer, e.g. uint64_t, 
may solve this problem.

Since the following step, MosaikAligner, is also a customer of 
concatenatedReference, throwing concatenatedReference out would cause a huge 
work.

I am playing this case and will employ uint64_t to resolve this problem.

Best,
Wan-Ping

Original comment by WanPing....@gmail.com on 30 Nov 2010 at 10:41

GoogleCodeExporter commented 8 years ago
Thank you for accepting it and explanation.
Yes, uint64_t is a good choice, because size of int in x64 .NET and VS 
applications is 4 Bytes for backward compatibility and on Linux AMD64 it is 
normally 8 Bytes.

Original comment by kulv...@gmail.com on 16 Dec 2010 at 12:22