ArtRand / signalAlign

HMM-HDP models for MinION signal alignments
MIT License
45 stars 12 forks source link

How to reuse files #18

Closed hd00ljy closed 5 years ago

hd00ljy commented 5 years ago

I tried to run the program after cloning the git repository and installing it but the program could not locate the fast5 files. So, I downloaded docker image ( Dec. 04. 2018 ) and tried again.

I have two issues - intermediate file reuse / the size of reference.txt


  1. Intermediate file reuse

I noticed that a lot of time is spent on making file below

backward_reference.txt forward_reference.txt temp_bwaIndex.amb temp_bwaIndex.ann temp_bwaIndex.pac

I want to use signalAlign for whole human genome multiple times and I think it would be waste of time to make these files every time I run the program


  1. the size of reference.txt

I am analyzing human WGS data but the reference.txt is not 3Gb but just 238Mb

Could you tell me what I have done wrong?

-rw-r--r-- 1 root root 20K Dec 4 23:42 temp_bwaIndex.amb -rw-r--r-- 1 root root 445K Dec 4 23:42 temp_bwaIndex.ann -rw-r--r-- 1 root root 1.5G Dec 4 23:42 temp_bwaIndex.pac -rw-r--r-- 1 root root 238M Dec 4 23:42 backward_reference.txt -rw-r--r-- 1 root root 238M Dec 4 23:42 forward_reference.txt

The command I used is as below

/home/signalAlign/bin/runSignalAlign -d basecalled/Sample_ID/GA10000/ --degenerate cytosine3 -r /data/REFERENCE/hg38/Homo_sapiens_assembly38.fasta -o Sample_ID_test_extract_signalalign > Sample_ID_test_extract_signalalign.out 2> Sample_ID_test_extract_signalalign.err