WGSExtract / WGSExtract.github.io

WGS Extract WWW home
https://WGSExtract.github.io/
GNU General Public License v3.0
34 stars 5 forks source link

Reference genome not decompressing, inaccessible. #11

Closed NathanTBurgess closed 2 years ago

NathanTBurgess commented 2 years ago

I am unable to download any reference genomes. Here are the error messages. Any suggestions

Screenshot 2022-08-07 171619 ?

I am trying to convert my nebula CRAM file into a 23 and me file. I made a program myself that does this but I want to check whether my program was accurate.

RandyHarr commented 2 years ago

The Reference Genome downloaded just fine.

Did you start the script with the installer or some other means?

The problem is the script could not process the downloaded file because it could not find the bioinformatic tools. Somehow the script got started without the path to the win10tools/bin which has the bioinformatic tools. That should have been setup in the shell before the call to the get_and_process_refgenomes.sh which calls process_reference_genomes.sh. When called from the installer, the environment is completely setup with the proper path.

It is also possible that your install was somehow incomplete. Meaning, the Cygwin64 environment installed but not the bioinformatic tools added to it. They come as two separate ZIP files to install. In v3, they are both mixed in win10tools/bin

NathanTBurgess commented 2 years ago

I've reinstalled it several times using Install_Win10 batch file. I am going to try redownloading the original zip file. Then I'll try installing it on a virtual Linux environment. I'll let you know what happens.

Here is the only part in the installation process that might have been outside the norm.

Screenshot 2022-08-08 112543

NathanTBurgess commented 2 years ago

Screenshot 2022-08-08 115453

Good sign, ;p I guess there was just an issue with the download.

NathanTBurgess commented 2 years ago

I still can't select it as a reference genome.

Screenshot 2022-08-08 115808

NathanTBurgess commented 2 years ago

Also, is the procedure:

Index > Stats > microarray RAW > 23 and me?

to convert the CRAM file to 23 and me?

RandyHarr commented 2 years ago

That is standard from pip and python libs.

Randy (from phone, DYACs involved) On August 8, 2022 11:40:58 AM Nathan Burgess @.***> wrote:

I've reinstalled it several times using Install_Win10 batch file. I am going to try redownloading the original zip file. Then I'll try installing it on a virtual Linux environment. I'll let you know what happens.Here is the only part in the installation process that might have been outside the norm.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

RandyHarr commented 2 years ago

That screen is to select a new location for your reference library. In case you plan to move it to another disk. The tool automatically determines what reference it needs and then looks for it in the reference library.

On August 8, 2022 11:58:49 AM Nathan Burgess @.***> wrote:

I still can't select it as a reference genome.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.

RandyHarr commented 2 years ago

yes. the manual has lots of detail.

Randy (from phone, DYACs involved) On August 8, 2022 12:00:53 PM Nathan Burgess @.***> wrote:

Also, is the procedure. Index > Stats > microarray RAW > 23 and me? to convert the CRAM file to 23 and me? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

RandyHarr commented 2 years ago

So is this issue resolved? I am not sure I helped at all yet but ...

NathanTBurgess commented 2 years ago

I think so, just one more question. If my 30x nebula CRAM file comes back as 0x what is the most likely cause?

RandyHarr commented 2 years ago

General questions should use the Facebook group

It could be a number of things. How big is the CRAM? How many gbases does the tool show it has (RAW or mapped)? How many read segments? If all the numbers are zero, then maybe the file was downloaded incorrectly or not able to be read. Should report that as an error but... If numbers are in the gbases and tens to hundreds of millions of read segments, maybe they did a WES or you ordered their 0.4x WGS product (which would show 0 in the tool due to rounding). Or they delivered that instead of a 30x.

Randy (from phone, DYACs involved) On August 9, 2022 12:23:48 PM Nathan Burgess @.***> wrote:

I think so, just one more question. If my 30x nebula CRAM file comes back as 0x what is the most likely cause? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

NathanTBurgess commented 2 years ago

Oh sorry, I'll ask questions there from now on. It is 44.5 Gb. Thanks for all of your help!

RandyHarr commented 2 years ago

If the file is 44 GB, then it is most likely a 30x WGS result. So there must be some other error if it is not reporting the proper statistics. Do you see anything odd in the command log window? What do the other stats on the Stats window show?

NathanTBurgess commented 2 years ago

It worked this time. I think there was an issue with the initial download of the program or the reference genomes the first time. Turned out to be 41x on average. Generous Nebula. Thanks for all of your help even though I didn't use the proper avenue of communication.

NathanTBurgess commented 2 years ago

I was doing this to tell whether the program I made to convert a VCF file into a 23 and me txt file worked. Mine was much more sparse. :( I was using the v3 23 and me blank template. I basically just created dictionaries and then for each entry in the 23 and me template that is also in the dictionary made from the VCF file I would write into a file in the 23 and me format. I created keys in the dictionaries using "chr" + chromosome number/letter +0(if it is a chromosome lower than 10) + location. I generated the VCF txt file using bcftools. No need to respond to this but if you have any immediate corrections to my thinking that pop up I am all ears.

NathanTBurgess commented 2 years ago

I am realizing now that the VCF tile I generated does not contain most of the locations in the 23 and me file. Maybe generating one from the CRAM file would work.