NEONScience / NEON-utilities

Utilities and scripts for working with NEON data. Currently: an R package with functions to join (stack) the month-by-site files in downloaded NEON data, to convert data to geoCSV format, and to download data from the API.
GNU Affero General Public License v3.0
57 stars 36 forks source link

zipsByURI not unzipping with new raw DNA sequence file format #93

Closed lstanish closed 4 years ago

lstanish commented 4 years ago

Function zipsByURI()

Describe the bug The function only looks for .tar.gz sequence files, however the newly formatted per sample files are in the .gz format, which is not compatible with the script as written.

When I add a code chunk to the existing function, I can successfully download raw sequence files in either the .tar.gz or .gz compressed formats. After Line 123 (the .tar.gz code chunk), the following chunk can be added:

else if (unzip == TRUE && (grepl("\.fastq\.gz", i)) ) { R.utils::gunzip(paste(savepath, gsub("^.\/", "", i), sep = "/"), remove=FALSE) if (!saveZippedFiles) { unlink(paste(savepath, gsub("^.\/", "", i), sep = "/"), recursive = FALSE) } }

To Reproduce run this code to download some mmg metadata: test <- loadByProduct(dpID="DP1.10108.001", site="CPER", package = 'expanded', check.size = F, startdate ="2014-01, enddate = "2014-02) download rawDataFiles and variables file into a folder, called "testRun/" for the zipsByURI function. Then run: zipsByURI("path/testRun/")

Expected behavior The compressed sequence files should unzip when argument 'unzip' = TRUE

System:

cklunch commented 4 years ago

@lstanish This is now fixed on GitHub, I think. I had to modify your regex a little bit, and I updated some other parts of the function as well, unrelated to the fastq files. I tested with DP1.10108.001 CPER data. Can you test as well, and let me know if it works as expected?

@kcawley Can you also test, and check if zipsByURI() still works as expected for your files as well?

Thanks!

lstanish commented 4 years ago

@cklunch Awesome, thanks! I tested with 2 CPER files in .fastq.gz format and got the following error:

Screen Shot 2020-07-17 at 11 37 17 AM

Looks like the error is originating from line 181. If you switched my code above from R.utils::gunzip to utils::unzip, that might be the source of the error? remove might not be an argument for that function.

cklunch commented 4 years ago

@lstanish OK, I was able to replicate this and I've pushed an updated version that works on my machine. I was really hoping to avoid making neonUtilities dependent on yet another package, but wasn't able to find an unzipping option outside of R.utils. Since this is the only thing it's used for, I've put R.utils in Suggests instead of Imports, so keep in mind it won't be installed by default when people install neonUtilities.

Give it a try and let me know if it works for you! I may poke around a bit more for other unzipping options, but most likely we'll have to stick with this one.

lstanish commented 4 years ago

@cklunch OK, I tested downloading some data in the new format and the old format, and both worked as expected! Bummer that this required another dependency. It's too bad that utils can't unzip .gz files.

cklunch commented 4 years ago

@lstanish Fix is now on CRAN!