CMU-SAFARI / SneakySnake

SneakySnake:snake: is the first and the only pre-alignment filtering algorithm that works efficiently and fast on modern CPU, FPGA, and GPU architectures. It greatly (by more than two orders of magnitude) expedites sequence alignment calculation for both short and long reads. Described in the Bioinformatics (2020) by Alser et al. https://arxiv.org/abs/1910.09020.
GNU General Public License v3.0
48 stars 11 forks source link

List commands and expected outputs #1

Closed zjin-lcf closed 3 years ago

zjin-lcf commented 3 years ago

Could you please list the commands you used for performance evaluation and expected results for each dataset available in your repository ?

I am not familiar with your domain, but the output at https://github.com/CMU-SAFARI/SneakySnake/tree/master/Snake-on-GPU is for 3000 not 30,000. Is that right ?

When "Number_of_warps_inside_each_block" is not equal to 32, should we expect the same output result ?

For the following error, can I generate an index file using the command "mrfast --index human_g1k_v37.fasta" ?

WARNING: ./human_g1k_v37.fasta.fai not found, the SAM file(s) will not have a header.
You can generate the .fai file using samtools. Please place it in the same directory with the index to enable SAM headers.
Reference genome index file read error.

Thanks

mealser commented 3 years ago

@zjin-lcf Please check Section 3.1 in the paper (https://arxiv.org/pdf/1910.09020.pdf) for dataset description. We used two datasets each of which has 30 million sequence pairs. You can run it for any number of sequences using the last parameter.

We uploaded all the datasets we used in our evaluation here: https://zenodo.org/record/4537807 Please directly use them as we show in the page you mentioned as following:

make
./Snake-on-GPU [ReadLength] [ReadandRefFile] [#reads]

We are also working on updating the Snake-on-GPU implementation to support long sequences of any length. Currently, SneakySnake supports sequences of any length as we evaluate in Figures 10 and 11 (https://arxiv.org/pdf/1910.09020.pdf).

"Number_of_warps_inside_each_block" will affect the parallelism, so you may get a different execution time. The GPU card we used for the evaluation is "NVIDIA GeForce RTX 2080Ti". You can still use any NVIDIA card though.

zjin-lcf commented 3 years ago

Thanks for your answers!

Do you expect the same results for different warp sizes ? I used the small dataset in your repo for the test.

When the number of warps in a block is 8, the output is: E: 0 Snake-on-GPU: 970.0000 Accepted: 296 Rejected: 29704 E: 1 Snake-on-GPU: 874.0000 Accepted: 947 Rejected: 29053 E: 2 Snake-on-GPU: 868.0000 Accepted: 2127 Rejected: 27873 E: 3 Snake-on-GPU: 901.0000 Accepted: 3750 Rejected: 26250 E: 4 Snake-on-GPU: 921.0000 Accepted: 5613 Rejected: 24387 E: 5 Snake-on-GPU: 937.0000 Accepted: 7404 Rejected: 22596 E: 6 Snake-on-GPU: 957.0000 Accepted: 9400 Rejected: 20600 E: 7 Snake-on-GPU: 993.0000 Accepted: 11442 Rejected: 18558 E: 8 Snake-on-GPU: 1020.0000 Accepted: 13505 Rejected: 16495 E: 9 Snake-on-GPU: 1077.0000 Accepted: 15610 Rejected: 14390 E: 10 Snake-on-GPU: 1104.0000 Accepted: 17492 Rejected: 12508

When the number of warps in a block is 32, the output is: E: 0 Snake-on-GPU: 1012.0000 Accepted: 294 Rejected: 29706 E: 1 Snake-on-GPU: 922.0000 Accepted: 925 Rejected: 29075 E: 2 Snake-on-GPU: 919.0000 Accepted: 2044 Rejected: 27956 E: 3 Snake-on-GPU: 946.0000 Accepted: 3601 Rejected: 26399 E: 4 Snake-on-GPU: 976.0000 Accepted: 5415 Rejected: 24585 E: 5 Snake-on-GPU: 990.0000 Accepted: 7176 Rejected: 22824 E: 6 Snake-on-GPU: 1031.0000 Accepted: 9160 Rejected: 20840 E: 7 Snake-on-GPU: 1065.0000 Accepted: 11193 Rejected: 18807 E: 8 Snake-on-GPU: 1120.0000 Accepted: 13255 Rejected: 16745 E: 9 Snake-on-GPU: 1168.0000 Accepted: 15359 Rejected: 14641 E: 10 Snake-on-GPU: 1218.0000 Accepted: 17240 Rejected: 12760

zjin-lcf commented 3 years ago

Could you please provide the expected result for the dataset (e.g. ./Datasets/ERR240727_1_E2_30000Pairs.txt) running on a GPU for verification ? I observe that the numbers differ significantly on GPUs of different vendors.

mealser commented 3 years ago

Can you run it for the 30 million pairs of ERR240727_1_E2_30million.zip (called in the paper as 100bp_1) or ERR240727_1_E40_30million.zip (called in the paper as 100bp_2)? You will find the exact numbers of filtering results provided in Table 7 in https://arxiv.org/pdf/1910.09020.pdf.

I just wonder what is the purpose of using different numbers of warps per block? which GPU are you using? I don't have much information about what you are trying to achieve, but in case you want to debug the output: You can set "Number_of_warps_inside_each_block" to 1 and change "#define PRINT 0" to 1, this will enable you to debug the output using a single GPU thread. It is also recommended to use a single sequence pair so that you can correctly observe the output matrix.

zjin-lcf commented 3 years ago

Thanks for your suggestions.