Bgi-LUSH / GPMeta

A GPU-accelerated method for ultrarapid pathogen identification from metagenomic sequences.
Apache License 2.0
21 stars 4 forks source link

error occuring during build database index #1

Open qiushunHe opened 4 months ago

qiushunHe commented 4 months ago

Hi, I am experiencing difficulties with GPmeta, as the instructions provided in the ReadMe document are overly simplistic and lack sufficient detail for effective troubleshooting.

For instance, when attempting to build the index, the direct use of the GPMetaIndex command does not yield any descriptions of its parameters. This omission leaves me unclear about the purpose and function of the parameters in the command GPMetaIndex human.fasta index_step 0.

Additionally, executing this command results in the error message "Floating point exception (core dumped)" (as listed below, the reference is the hg19 downloaded from ucsc), which, without further context, does not allow me to pinpoint the root cause of the issue. More comprehensive documentation would be greatly beneficial. image

Furthermore, regarding the section on "loading the reference database to GPU", could you clarify the format required for the "get_species_genus.xls" file and provide guidance on how to prepare it?

tiankong-zhicheng commented 4 months ago

1:index_step是指建立索引时,指定的步长间隔,你需要指定一个整数来设置间隔,从而在建立索引时跳跃前进,这样可以减少生成的索引文件的体积。但index_step不适宜设置过大,这样会使最终比对的精度降低。如果想完整遍历fasta文件,index_step可以设置为1。

2:GPMetaIndex命令最后的参数0和1用于区分人原和病原数据。

举例1:GPMetaIndex hg19.fasta 1 0 表示建立人原的索引,步长间隔为1。 举例2:GPMetaIndex seq.fasta 3 1 表示建立病原的索引,步长间隔为3。

3:get_species_genus.xls文件表示的是物种的信息。第一列,fasta序列里面的每一条序列的编号;第二列,物种名(species);第三列,物种的属名(genus)。 文件格式可以参考('\t'分隔): NC_1 Buchnera aphidicola Buchnera NC_2 Buchnera aphidicola Buchnera NC_3 Archangium gephyra Archangium

tiankong-zhicheng commented 4 months ago

1: The index_step refers to the step interval specified when building an index. You need to specify an integer to set the interval, allowing the index construction to skip ahead and reduce the size of the generated index files. However, it is not advisable to set index_step too large, as this would reduce the accuracy of the final alignment. If you want to traverse the entire fasta file, index_step can be set to 1.

2: The final parameters 0 and 1 in the GPMetaIndex command are used to distinguish between human and pathogenic data. For example, the command "GPMetaIndex hg19.fasta 1 0" indicates the creation of an index for human data with a step interval of 1. Similarly, "GPMetaIndex seq.fasta 3 1" indicates the creation of an index for pathogenic data with a step interval of 3.

3: The get_species_genus.xls file represents information about species. The first column contains the identifier of each sequence in the fasta file, the second column contains the species name (species), and the third column contains the genus of the species. Each column in the file is separated by a tab ("\t"). NC_1 Buchnera aphidicola Buchnera NC_2 Buchnera aphidicola Buchnera NC_3 Archangium gephyra Archangium

hedy-ella commented 1 month ago

Hello, I am facing similar problem, I used 5 sequences for testing and the server has many CPU & GPU available, but I am facing this error (Segmentation fault (core dumped)) after running this command (GPMetaIndex test.fasta 3 1). Thanks for any reply