Open AndrewSkelton opened 9 years ago
Hi Andrew, Thanks for the feedback. I think you might be right there. I know I've run about 100 through though, so you should be able to get a bit more than 27 out! Are you able to make the paths to your files shorter, eg by changing your working dir to something closer to the files?
It might be a while before I can look into this, so please report back if you're able to break your job into smaller batches and then merge the expressiondata objects in R using cbind.
Cheers, Mark On Sat, 29 Nov 2014 at 2:18 am, Andrew Skelton notifications@github.com wrote:
Hi, I tried out lumidat on a large number of IDAT files (~200) and I get truncation errors. I believe this might be because the resulting java call will be extremely long.
- I get ~27 samples in the resultant lumibatch object
— Reply to this email directly or view it on GitHub https://github.com/drmjc/lumidat/issues/2.
Hi Mark,
I ran an isolated run of the compiled jar with everything in the same directory (thus, minimising the path lengths). That ran fine, (although I had to give java ~8GB, but that's by the by), with 2 warnings:
WARNING, retained 4188 probes with low numbers of beads. These may cause havoc in downstream analysis, like the lumi pipeline.
WARNING, retained 81 probes with low numbers of beads. These may cause havoc in downstream analysis, like the lumi pipeline.
Wrote: Sample Probe Profile.txt
Wrote: Control Probe Profile.txt
I thought that was a bit unusual that it threw two of those warnings, thoughts?
With regards to combining multiple outputs, the sample probe profile shouldn't be an issue, as you can give lumiR more than one and it'll do all the combining. I'm not sure about how to go about combining the control probe profile files, but I'll take a look.
Hi Andrew, thanks for the update, that's great.
That message is printed by the writeFinalOutput method, and is thus being produced during the creation of both the sample probes and control probes files.
If using the java implementation, there are two ways for handling large numbers of arrays: a large Zip file of all iDAT's, or by sending filenames to stdin:
{code} $ java -jar lumidat-1.2.2.jar Welcome to lumidat, version 1.2.2. Mark Cowley, Garvan Institute of Medical Research (2013).
ERROR: no input files identified.
usage: java [-Xmx1024m] -jar lumidat-1.2.2.jar
and if using the R implementation, the ‘zip.file’ option is available in lumiR.idat or read.ilmn.
A generic solution to this would be to refactor the R interface to send idat paths via stdin to the underlying java process. Given there’s a reasonable workaround already, this is going to be fairly low on my priority list. I’d be very happy to merge a pull request from you if you’re able to do this.
Finally, If you did have to combine batches, i’d just run lumiR.idat on each batch and then cbind the objects.
cheers, Mark
is there any source to that jar file??
Hi, I tried out lumidat on a large number of IDAT files (~200) and I get truncation errors. I believe this might be because the resulting java call will be extremely long.
Maybe worth doing one java call per array (as in 12 arrays on a HT-12), make the sample probe profile and control probe profile, then combine them in some way at the end? Or do all files need to be loaded in at once?