SebastianMeyer1989 / UltraPlexer

The UltraPlexer is a kmer-based tool that allows assigning non-barcoded long-read sequences generated by the Oxford Nanopore Technology to isolates, by matching them to barcoded short-read sequences generated by Illumina Technology.
MIT License
10 stars 2 forks source link

perl Math::GSL #2

Closed samlipworth closed 5 years ago

samlipworth commented 5 years ago

Hi Sebastian,

I'm having a problem with perl dependencies (I know this is not the fault of your code). Do you have any idea how to resolve this? We're excited to try this out.

$ cpanm Math::GSL --force
--> Working on Math::GSL
Fetching http://www.cpan.org/authors/id/L/LE/LETO/Math-GSL-0.40.tar.gz ... OK
==> Found dependencies: Alien::GSL
--> Working on Alien::GSL
Fetching http://www.cpan.org/authors/id/J/JB/JBERGER/Alien-GSL-1.01.tar.gz ... OK
Configuring Alien-GSL-1.01 ... OK
Building and testing Alien-GSL-1.01 ... FAIL
! Installing Alien::GSL failed. See /home/sam/.cpanm/work/1561625575.23673/build.log for details. Retry with --force to force install it.
! Installing the dependencies failed: Module 'Alien::GSL' is not installed
! Bailing out the installation for Math-GSL-0.40.
samlipworth commented 5 years ago

should say this is what I get when I try to run Ultraplexer:

perl UltraPlexer.pl Can't locate Math/GSL/Randist.pm in @INC (you may need to install the Math::GSL::Randist module) (@INC contains: /home/sam/Bioprog/UltraPlexer/perlLib /home/sam/anaconda3/lib/site_perl/5.26.2/x86_64-linux-thread-multi /home/sam/anaconda3/lib/site_perl/5.26.2 /home/sam/anaconda3/lib/5.26.2/x86_64-linux-thread-multi /home/sam/anaconda3/lib/5.26.2 .) at UltraPlexer.pl line 10. BEGIN failed--compilation aborted at UltraPlexer.pl line 10.

SebastianMeyer1989 commented 5 years ago

Thank you for your message. As far as i can tell, it seems, the module "Math::GSL::Randist" is missing (which you probably already know). I will look into it, but i don't know, if i can solve it for you. I will also forward this to @AlexanderDilthey, since he is the mind behind the code. Maybe he has an idea.

samlipworth commented 5 years ago

Thanks - yes that looks like the problem. I can't seem to install it though ( on Ubuntu 18.02 LTS and I've tried on a few other machines). Posted to stack overflow as well but just wondered if you might have come across this issue too.

AlexanderDilthey commented 5 years ago

GSL is installed on your system? Many Linux distributions also come with packages for important Perl modules, it might well be that there is an Ubuntu package for Math::GSL.

SebastianMeyer1989 commented 5 years ago

I work on Ubuntu 16.04 LTS and, yes, some GSL packages are allready installed. When i executed "cpanm Math::GSL" in the terminal, it also finished without problems.

samlipworth commented 5 years ago

OK perl dependencies now resolved via manual install (no idea why they wouldn't work via cpanm) but now I have a new error:

Experimental keys on scalar is now forbidden at UltraPlexer.pl line 389. Type of arg 1 to keys must be hash or array (not anonymous hash ({})) at UltraPlexer.pl line 389, near "};" Type of arg 1 to List::MoreUtils::XS::all must be block or sub {} (not reference constructor) at UltraPlexer.pl line 793, near "@ll)" Execution of UltraPlexer.pl aborted due to compilation errors.

samlipworth commented 5 years ago

which version of perl is this written in? I know almost nothing about perl but I wonder if this is a perl version problem?

SebastianMeyer1989 commented 5 years ago

The program worked on a device with perl v5.16.3. The internet says, that they are in fact such "experimental key" issues (whatever this exactly means) with some versions. It also recommends to put another code line at the top of the script: So you could try to paste use 5.012; # so keys/values/each work on arrays as the first line into the script (above "use strict")

Which version of perl are you using? You can find the number at the bottom of the manual (man perl).

AlexanderDilthey commented 5 years ago

@samlipworth I have pushed an update; please carry out a pull from GitHub and try again. If this still doesn't work, please post the results of perl -v this thread.

samlipworth commented 5 years ago

now the script executes on my laptop (but not on our cluster, still battling with perl dependencies - perhaps it might be sensible to make a conda/docker release to make this easier?)

When I execute on my laptop I get the error: my machine has 16Gb of memory so I think it ought to be able to handle the data - looks like a cortex problem?

Sample RHB11-C13, have 426.45 Mb of sequencing data

-----
Fri Jun 28 13:08:29 2019
Starting Cortex, version 1.0.5.21
Command: /home/sam/Bioprog/cortex/bin/cortex_var_31_c60 --mem_height 20 --mem_width 100 --pe_list /home/sam/Bioprog/UltraPlexer/cortex_temp/prefix1_RHB11-C13_19.ctx.pe1,/home/sam/Bioprog/UltraPlexer/cortex_temp/prefix1_RHB11-C13_19.ctx.pe2 --dump_binary /home/sam/Bioprog/UltraPlexer/cortex_temp/prefix1_RHB11-C13_19.ctx --kmer_size 19 --remove_pcr_duplicates --quality_score_threshold 5 --dump_covg_distribution /home/sam/Bioprog/UltraPlexer/cortex_temp/prefix1_RHB11-C13_19.ctx.coverage 
Maximum k-mer size (compile-time setting): 31
Actual K-mer size: 19
could not allocate hash table of size 104857600
Error: Giving up - unable to allocate memory for the hash table
Command /home/sam/Bioprog/cortex/bin/cortex_var_31_c60 --mem_height 20 --mem_width 100 --pe_list /home/sam/Bioprog/UltraPlexer/cortex_temp/prefix1_RHB11-C13_19.ctx.pe1,/home/sam/Bioprog/UltraPlexer/cortex_temp/prefix1_RHB11-C13_19.ctx.pe2 --dump_binary /home/sam/Bioprog/UltraPlexer/cortex_temp/prefix1_RHB11-C13_19.ctx --kmer_size 19 --remove_pcr_duplicates --quality_score_threshold 5 --dump_covg_distribution /home/sam/Bioprog/UltraPlexer/cortex_temp/prefix1_RHB11-C13_19.ctx.coverage failed at UltraPlexer.pl line 1182.
samlipworth commented 5 years ago

There's also an R script referenced in the perl text but I don't think this is in the repository?

samlipworth commented 5 years ago

So I changed the max colours in cortex to be always 20 and cortex completes but then I get the following.. Sorry to keep posting, I'm really keen to get this to work but I've tried for several hours and can't

`perl UltraPlexer.pl --prefix prefix1 --action classify --samples_file samples --longReads_FASTQ all.fastq Sample RHB11-C13, have 426.45 Mb of sequencing data


Fri Jun 28 16:58:43 2019 Starting Cortex, version 1.0.5.21 Command: /home/sam/Bioprog/cortex/bin/cortex_var_31_c20 --mem_height 20 --mem_width 100 --pe_list /home/sam/Bioprog/UltraPlexer/cortex_temp/prefix1_RHB11-C13_19.ctx.pe1,/home/sam/Bioprog/UltraPlexer/cortex_temp/prefix1_RHB11-C13_19.ctx.pe2 --dump_binary /home/sam/Bioprog/UltraPlexer/cortex_temp/prefix1_RHB11-C13_19.ctx --kmer_size 19 --remove_pcr_duplicates --quality_score_threshold 5 --dump_covg_distribution /home/sam/Bioprog/UltraPlexer/cortex_temp/prefix1_RHB11-C13_19.ctx.coverage Maximum k-mer size (compile-time setting): 31 Actual K-mer size: 19 Hash table created, number of buckets: 1048576 No SE data Input file of paired end data: /home/sam/Bioprog/UltraPlexer/cortex_temp/prefix1_RHB11-C13_19.ctx.pe1, and /home/sam/Bioprog/UltraPlexer/cortex_temp/prefix1_RHB11-C13_19.ctx.pe2, quality cut-off: 5 Removing duplicates from the paired end files when both mates start with the same kmer


Fri Jun 28 16:58:43 2019 Collisions: tries 0: 347500623


Fri Jun 28 17:00:12 2019 Sequence data loaded Total bases parsed:426451880 Total bases passing filters and loaded into graph:394868470 Mean read length after filters applied:149 Setting the following per-colour sequencing error rates (used only for genotyping): Colour Rate 0 0.010 1 0.010 2 0.010 3 0.010 4 0.010 5 0.010 6 0.010 7 0.010 8 0.010 9 0.010 10 0.010 11 0.010 12 0.010 13 0.010 14 0.010 15 0.010 16 0.010 17 0.010 18 0.010 19 0.010 Total kmers in table: 19405260 The following is a summary of the data that has been loaded, immediately after loading (prior to any error cleaning, calling etc)


SUMMARY: Colour SampleID MeanReadLen TotalSeq ErrorCleaning LowCovSupsThresh LowCovNodesThresh PoolagainstWhichCleaned 0 undefined 149 394868470 UNCLEANED -1 -1 undefined 1 undefined 0 0 UNCLEANED -1 -1 undefined 2 undefined 0 0 UNCLEANED -1 -1 undefined 3 undefined 0 0 UNCLEANED -1 -1 undefined 4 undefined 0 0 UNCLEANED -1 -1 undefined 5 undefined 0 0 UNCLEANED -1 -1 undefined 6 undefined 0 0 UNCLEANED -1 -1 undefined 7 undefined 0 0 UNCLEANED -1 -1 undefined 8 undefined 0 0 UNCLEANED -1 -1 undefined 9 undefined 0 0 UNCLEANED -1 -1 undefined 10 undefined 0 0 UNCLEANED -1 -1 undefined 11 undefined 0 0 UNCLEANED -1 -1 undefined 12 undefined 0 0 UNCLEANED -1 -1 undefined 13 undefined 0 0 UNCLEANED -1 -1 undefined 14 undefined 0 0 UNCLEANED -1 -1 undefined 15 undefined 0 0 UNCLEANED -1 -1 undefined 16 undefined 0 0 UNCLEANED -1 -1 undefined 17 undefined 0 0 UNCLEANED -1 -1 undefined 18 undefined 0 0 UNCLEANED -1 -1 undefined 19 undefined 0 0 UNCLEANED -1 -1 undefined



Fri Jun 28 17:00:12 2019 Input data was fasta/q, so dump single colour binary file: /home/sam/Bioprog/UltraPlexer/cortex_temp/prefix1_RHB11-C13_19.ctx


Fri Jun 28 17:00:35 2019 Binary dumped


Fri Jun 28 17:00:35 2019 Dump kmer coverage distribution for colour 0 to file /home/sam/Bioprog/UltraPlexer/cortex_temp/prefix1_RHB11-C13_19.ctx.coverage


Fri Jun 28 17:00:36 2019 Covg distribution dumped


Fri Jun 28 17:00:36 2019 Cortex completed - y'all have a nice day! Fatal error: cannot open file 'plotCoverage.R': No such file or directory Command Rscript plotCoverage.R /home/sam/Bioprog/UltraPlexer/cortex_temp/prefix1_RHB11-C13_19.ctx.coverage RHB11-C13 failed at UltraPlexer.pl line 1187.`

samlipworth commented 5 years ago

I think the issue is the missing R script because you call for the output of that later on from the look of it - can you please provide plotCoverage.R in the repository?

AlexanderDilthey commented 5 years ago

@samlipworth You're absolutely right, just added the file! Sorry!

samlipworth commented 5 years ago

@AlexanderDilthey thankyou for your help - this is now running and I'm really excited to get it running on some real data - will let you know how we get on. Some suggestions for possible enhancements if you find the time (haha!)- something to get round the problem of perl dependencies (conda/docker perhaps) and parallelisation or some way to boost the speed. I'm going to close this now as the immediate issue is solved.

AlexanderDilthey commented 5 years ago

@samlipworth Thank you! Yes, feedback always appreciated! One thing we also realized we need is a small "test data" package for users to make sure their local installation works - we're assembling this at the moment and will push to GitHub ASAP.