bioinfo-ut / GenomeTester4

A toolkit for performing set operations - union, intersection and complement - on k-mer lists.
GNU General Public License v3.0
32 stars 14 forks source link

Help: GenomeTester4 configuration #8

Closed taranglute closed 6 years ago

taranglute commented 6 years ago

Hi

We are working on analysis of Bioinformatics tools (related to Kmer counting) and GenomeTester4 is one of them. We have gone through readme file and it is very helpful. As we are doing analysis so we want to be very sure about details. It would be great if you help us validating below details.

Data structure and Sorting Algo: Array / Sorting Approach: Disk based The limit of k-size : less than 33 Supports online k-mer frequency retrieval : No Supports compressed file processing : No

Thanks Tarang

bioinfo-ut commented 6 years ago

We have actually 2 different tools/packages in this repository. GenomeTester4 creates list from ALL k-mers found in given FASTQ files and handles those lists. Lists are sorted and stored on disk as a single file. Compile it by typing: make all Citation: Kaplinski, 2015, GigaScience

FastGT package counts frequencies of USER-PROVIDED k-mers in given FASTQ file. It is much faster tool for analysing a subset of k-mers. It uses custom hybrid data structure for storing k-mer frequencies during the processing. Compile it by typing _make gmer_counter gmercaller Citation: Pajuste, 2017, Scientific Reports