dhimmel / fratjuice

Uncovering the microbes of fraternity basements
Creative Commons Zero v1.0 Universal
6 stars 2 forks source link

Accessing fastq files from raw uBiome zips #11

Closed vsaraswathula closed 5 years ago

vsaraswathula commented 5 years ago

I am having trouble accessing the fastq files that are in the raw uBiome zip files. Using standard file extraction doesn't work because unzipping XXXXX.zip will produce XXXXX.zip.cpgz. Attempting to open XXXXX.zip.cpgz produces another XXXXX.zip file, leading to a frustrating cycle.

In the "fastq" branch of the repository, there are fastq.gz files for forward and reverse reads that I was able to open and analyze in R, so there must be a way to access these files--I just cannot figure it out.

dhimmel commented 5 years ago

Interesting. I've never heard of this issue, but it seems like it could occur from a variety of problems. My guess is that the files you're accessing in the master branch are not actually zip files. Instead they are text files with the Git LFS files information.

Git LFS is what we use for "large file storage". If you don't have it installed on your computer, I think that you may have ended up downloading a text file rather than the actual binary zip archive. If you install git LFS and then clone the repository, I am hoping that fixes the issue.

vsaraswathula commented 5 years ago

This did indeed fix the issue! I installed git LFS and have accessed the forward and reverse reads in the master branch's zips. Thank you!