brian-cleary / LatentStrainAnalysis

Partitioning and analysis methods for large, complex sequence datasets
MIT License
38 stars 21 forks source link

write_partition_parts.py crashes on test data #19

Open AlphaSquad opened 8 years ago

AlphaSquad commented 8 years ago

With a freshly cloned LSA version and after running: tar -xvf testData.tar.gz original_reads/ bash HashCounting.sh 6 33 22 bash KmerSVDClustering.sh 6 22 .8 bash ReadPartitioning.sh 4 like described in the docs, LSA crashes with the following error:

Tue Apr 19 13:30:32 CEST 2016 partitioning reads in hashed input file 4 parallel: This job failed: echo $(date) partitioning reads in hashed input file 4; \ python LSA/write_partition_parts.py -r 4 -i hashed_reads/ -o cluster_vectors/ -t tmp/ >> Logs/ReadPartitions.log 2>&1; \ if [ $? -ne 0 ]; then exit 1; fi printing end of last log file... Traceback (most recent call last): File "LSA/write_partition_parts.py", line 143, in while id_vals[0] == r_id: IndexError: index 0 is out of bounds for axis 0 with size 0

I'm running Ubuntu 14.10, Python 2.7.8, numpy 1.11.0, scipy 0.17.0 and GNU parallel 20160322

igsbma commented 8 years ago

got exactly the same error, please advice! Thanks!

Lnc commented 8 years ago

I have the same problem.

For quick and dirty test, I modify lines 132 and 143 in file LSA/write_partition_parts.py:

I have no idea if it is correct.


After running bash ReadPartitioning.sh 4 again, I obtain this error (line 5, file ReadPartitioning.sh):

I replace \($(($c*400/$t))%\) by \($c x 400 / $t %\) (I do not care about the result).

Output is:

Quantifying spike enrichment
Total spiked reads: 20374
grep: read_partitions/126/*.fastq: No such file or directory
Spiked read counts by partition (top 5)
partition 0: 0 spiked reads out of 130 total reads in partition 0 (0 x 400 / 520 %)
partition 0: 0 spiked reads out of 130 total reads in partition 0 (0 x 400 / 520 %)
partition 0: 0 spiked reads out of 130 total reads in partition 0 (0 x 400 / 520 %)
partition 0: 0 spiked reads out of 130 total reads in partition 0 (0 x 400 / 520 %)
partition 0: 0 spiked reads out of 130 total reads in partition 0 (0 x 400 / 520 %)

(It does not look right.)


Last error is:

In file ReadPartitioning.sh, I remove the keyword do (line 43) I insert the line do after the line 42 (for i in $(seq 1 $numClusterTasks)), I insert the line if [ $((i-1)) -ne 126 ]; then after the line 43 (do), and I insert the line fi after the new line 48 (echo partition...)

Again, I have no idea if it is correct.

Output is the same:

Quantifying spike enrichment
Total spiked reads: 20374
Spiked read counts by partition (top 5)
partition 0: 0 spiked reads out of 130 total reads in partition 0 (0 x 400 / 520 %)
partition 0: 0 spiked reads out of 130 total reads in partition 0 (0 x 400 / 520 %)
partition 0: 0 spiked reads out of 130 total reads in partition 0 (0 x 400 / 520 %)
partition 0: 0 spiked reads out of 130 total reads in partition 0 (0 x 400 / 520 %)
partition 0: 0 spiked reads out of 130 total reads in partition 0 (0 x 400 / 520 %)

(Again, It does not look right.)


Debian GNU/Linux 8.6 (jessie), Python 2.7.9, NumPy 1.11.2, SciPy 0.18.1, GNU parallel 20130922, GNU bash, version 4.3.30(1)-release (x86_64-pc-linux-gnu)

brian-cleary commented 8 years ago

Hi folks,

Sorry for my mega slowness. I just checked in a new version of write_partition_parts. This addresses a bug introduced by new functionality of numpy fromstring in more recent versions. Should fix the problem with running test data.