WGSExtract / WGSExtract.github.io

WGS Extract WWW home
https://WGSExtract.github.io/
GNU General Public License v3.0
31 stars 5 forks source link

Issue with Alignment Function (Convert grch38 BAM file to grch37 format) #10

Closed Emw456 closed 2 years ago

Emw456 commented 2 years ago

Hello, I've been experiencing an issue with the "Align" function. After WGSE attempts to generate the two fastq files, I usually get an error saying that there's an issue with my original BAM file.

FYI, I used sequencing.com to obtain my BAM file.

  1. Is there some way I can try the "align" function on the two fq.gz files that I received from sequencing.com instead of generating new fastq files? If so, how can I proceed with this?
  2. WGSE stated that there was an issue in the header of my BAM file. How can I resolve this?
RandyHarr commented 2 years ago

What version / date of the WGSE tool are you using?

There is an "Align" and "Unalign" button on the third tab. Those two functions are called internally to form the "Realign" button you may be trying to use. "Align" starts with FASTQs.

If seqeuencing.com did your original sequencing, they use a custom reference model they developed themselves. A variation of hs38a. It is available in the v4 tools but not v3.

If truly a GRCh model BAM, then it is possible there is an issue. The GRCh model cannot be processed by BWA (for as yet unknown reasons). So it requires a different alignment tool. I believe the tool has that setup correctly but not all platforms have the other alignment tools available.

It is best to capture the command log screen and post here (or as a .txt file). If possible, turn on DEBUG MODE before running to provide additional information. Often a "header" error is caused by some other error before that.

Emw456 commented 2 years ago

Thanks for your comment! I appreciate the clarification. I recently downloaded V4 and am testing it on my sequencing.com BAM file right now. I also turned on "DEBUG" mode just in case. Say, if I want to re align my original hs38s (by Sequencing) BAM file to the preferred format for text file conversion (hs37d5), do I download reference genome(s) 7 or 16? In addition, are there other reference genomes that I need to download to make the re-alignment process succeed?

Screen Shot 2022-08-07 at 10 12 46 AM

.

RandyHarr commented 2 years ago

(7) is from the NIH server. (16) from the EBI server in UK. Your proximity to the server does not indicate the best location to use. It is the ISP and what internet caches they support. Try 7. If it gets errors, try 16. Or vice-versa.

Realign only needs the target reference genome. Unless you start from a CRAM. The CRAM itself needs the reference genome it was compressed with.

So I take it you did not have a GRCh based reference genome BAM file (the reference from the EBI servers) after all? Should we consider the issue solved and closed?

Emw456 commented 2 years ago

Thanks for the comment! I am currently running the "Align" function and waiting for any outcomes. Since Terminal is still running, I cannot conclude at the moment if everything is working as it should for me. However, for now, we can consider the issue solved and closed until I decide to resurface anything.

Emw456 commented 2 years ago

Hello Randy,

I wanted to update you to let you know that while the two fastq files have successfully been downloaded to my desktop, I'm still experiencing issues with creating the new BAM file (hs37d5 reference). Do you have a reason why this is the case? Here is a screenshot of my terminal+debug comments. Thanks.

[image: Screen Shot 2022-08-08 at 12.52.31 AM.png]

Best,

Elizabeth Wang

On Sun, Aug 7, 2022 at 4:58 PM Randy H @.***> wrote:

Closed #10 https://github.com/WGSExtract/WGSExtract.github.io/issues/10 as completed.

— Reply to this email directly, view it on GitHub https://github.com/WGSExtract/WGSExtract.github.io/issues/10#event-7142890976, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVBVYHKJJRTQXBIBERWGLJLVYBEUBANCNFSM55ZRJA4Q . You are receiving this because you authored the thread.Message ID: @.*** com>

--

Elizabeth Wang ;)<3(:

RandyHarr commented 2 years ago

Your screenshot did not come through (on the website or the email sent to me directly). Can you try to attach / send again?

Emw456 commented 2 years ago

Hello Randy,

Here is the screenshot. I'm responding via Gmail. Thanks for your help!

Sincerely,

Elizabeth Wang

On Mon, Aug 8, 2022 at 6:10 AM Randy H @.***> wrote:

Your screenshot did not come through (on the website or the email sent to me directly). Can you try to attach / send again?

— Reply to this email directly, view it on GitHub https://github.com/WGSExtract/WGSExtract.github.io/issues/10#issuecomment-1208108774, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVBVYHKFJKQTZV5RTNVDF3DVYEBLFANCNFSM55ZRJA4Q . You are receiving this because you authored the thread.Message ID: @.***>

--

Elizabeth Wang ;)<3(:

RandyHarr commented 2 years ago

Hello Randy, Here is the screenshot. I'm responding via Gmail. Thanks for your help! Sincerely, Elizabeth Wang

Still nothing. You successfully posted a screenshot yesterday with your original question. Not sure what you are doing differently. Maybe try on the website directly?

Emw456 commented 2 years ago

I apologize about the inconvenience. I tried again just now using the same method. I'm not sure why the screenshot after my first one did not show either.

Just in case, I also attached an imgur link here: https://imgur.com/iwwwFKT

terminal
RandyHarr commented 2 years ago

That finally worked. (to get the image) The errors are likely all due to the same thing. It is giving a permission error when trying to run the BWA executable. BWA (the aligner) will take 8 hours on the absolute best Apple x86_64 based machine. But more likely 3-4 days on a more typical, especially M1, machine. Creating the indices of the reference by BWA takes 2-3 hours. If you notice, your create_align_indices script finished in 0 seconds. What I see in the above is every time the tool tries to run BWA, it is told /opt/local/bin/bwa: permission denied.

BWA does not get installed from Macports because both they do not have it there. So the installer finds it at https://github.com/smikkelsendk/bwa-for-arm/raw/master/bin/ (both M1 and x86 versions), downloads the appropriate one, sets the permission to execute, and then moves it to /opt/local/bin as if installed by MacPorts. While this works fine on my M1 and x86 Mac's, it clearly is giving issues with yours. Can you verify the file /opt/local/bin/bwa exists. If you can, in a terminal, try and run it by simply typing that same path/name. You can try, in a terminal, to execute the command "chmod a+x /opt/local/bin/bwa" to give it execute permissions. But I wonder if MacOS gatekeeper is getting involved and not allowing it to run. You could even delete /opt/local/bin/bwa and rerun the installer. It should detect it is missing and reinstall it.

I reopened the issue because it is still around the same issue of getting Alignment / Realignment to work.

Emw456 commented 2 years ago

Sorry for the late response. Thank you so much for troubleshooting. It’s been a few years since I last used Linux so I appreciate your advice.

You are right that /opt/local/bin/bwa did not exist, thus the the program terminated before the aligning process. I was able to fix this with some Linux code (i.e. “sudo chmod 755 /opt/local/bin/bwa” or something similar) to grant permission to read/access/change files.

Last but not least, how were you able to resolve any storage related issues? The files uploaded and/or generated to/from WGSE take up alot of storage (nearly all my disk space). What methods work for you to combat this issue?

Since this issue is resolved, we can close this case afterwards.

On Aug 8, 2022, at 2:13 PM, Randy H @.***> wrote:

 Reopened #10.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.

RandyHarr commented 2 years ago

On my M1 MacMini, for example, I made sure to buy the 2TB SSD version. Then also have one of those adapters the Mac mini sits on to give more ports and has space for an internal, additional 6TB SSD. You could also just connect an external SSD or even spinning disk for the additional bulk storage. Ditto if using a MacBook. MacBook Pros with the Intel i9 processor, 64GB RAM and a 2TB internal SSD are optimum and faster than most DESKTOP Windows PCs.

It is important to NOT use external or network storage for the temp/ directory. That needs to be your highest performance "cache" area for bulk storage. And avoid wireless connections to any storage you use. Alignment of a typical 30x WGS will take 300 GB in the temp area and another 100-150 GB in the output directory. Your initial files, output directory and the WGS Extract program itself can reside in an attached disk if you have a higher performance interface to it.

Bioinformatics really stresses most machines. BWA aligner, samtools sort, and bgzip compression do so the most.

The biggest consumer of space on a MacOS install are the MacOS Xcode CLI tools. But both Macports and Python PIP require them to bring in and locally compile some libraries during install. If you were to install the full Xcode environment, that takes 1 TB itself!

Randy (from phone, DYACs involved) On August 9, 2022 2:42:59 AM Emw456 @.***> wrote:

Sorry for the late response. Thank you so much for troubleshooting. It’s been a few years since I last used Linux so I appreciate your advice.

You are right that /opt/local/bin/bwa did not exist, thus the the program terminated before the aligning process. I was able to fix this with some Linux code (i.e. “sudo chmod 755 /opt/local/bin/bwa” or something similar) to grant permission to read/access/change files.

Last but not least, how were you able to resolve any storage related issues? The files uploaded and/or generated to/from WGSE take up alot of storage (nearly all my disk space). What methods work for you to combat this issue?

Since this issue is resolved, we can close this case afterwards.

On Aug 8, 2022, at 2:13 PM, Randy H @.***> wrote:

 Reopened #10.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.

Emw456 commented 2 years ago

Thanks for your advice. I've been looking up external SSDs to purchase. So, to confirm, if I move my original BAM file , output directory (e.g. such as a folder for storing the new/aligned BAM), WGSE application to an attached disk or external drive, I will be able to run the application without any performance related issues?

"It is important to NOT use external or network storage for the temp/ directory. That needs to be your highest performance "cache" area for bulk storage. And avoid wireless connections to any storage you use."

I did not know this in advance so thanks for letting me know about this!

On Tue, Aug 9, 2022 at 2:32 AM Randy H @.***> wrote:

On my M1 MacMini, for example, I made sure to buy the 2TB SSD version. Then also have one of those adapters the Mac mini sits on to give more ports and has space for an internal, additional 6TB SSD. You could also just connect an external SSD or even spinning disk for the additional bulk storage. Ditto if using a MacBook. MacBook Pros with the Intel i9 processor, 64GB RAM and a 2TB internal SSD are optimum and faster than most DESKTOP Windows PCs.

It is important to NOT use external or network storage for the temp/ directory. That needs to be your highest performance "cache" area for bulk storage. And avoid wireless connections to any storage you use. Alignment of a typical 30x WGS will take 300 GB in the temp area and another 100-150 GB in the output directory. Your initial files, output directory and the WGS Extract program itself can reside in an attached disk if you have a higher performance interface to it.

Bioinformatics really stresses most machines. BWA aligner, samtools sort, and bgzip compression do so the most.

The biggest consumer of space on a MacOS install are the MacOS Xcode CLI tools. But both Macports and Python PIP require them to bring in and locally compile some libraries during install. If you were to install the full Xcode environment, that takes 1 TB itself!

Randy (from phone, DYACs involved) On August 9, 2022 2:42:59 AM Emw456 @.***> wrote:

Sorry for the late response. Thank you so much for troubleshooting. It’s been a few years since I last used Linux so I appreciate your advice.

You are right that /opt/local/bin/bwa did not exist, thus the the program terminated before the aligning process. I was able to fix this with some Linux code (i.e. “sudo chmod 755 /opt/local/bin/bwa” or something similar) to grant permission to read/access/change files.

Last but not least, how were you able to resolve any storage related issues? The files uploaded and/or generated to/from WGSE take up alot of storage (nearly all my disk space). What methods work for you to combat this issue?

Since this issue is resolved, we can close this case afterwards.

On Aug 8, 2022, at 2:13 PM, Randy H @.***> wrote:

 Reopened #10.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.

— Reply to this email directly, view it on GitHub https://github.com/WGSExtract/WGSExtract.github.io/issues/10#issuecomment-1209143031, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVBVYHO7HFF6TLYZWPZSVE3VYIQRBANCNFSM55ZRJA4Q . You are receiving this because you authored the thread.Message ID: @.***>

--

Elizabeth Wang ;)<3(: