WGSExtract / WGSExtract-Dev

WGS Extract Developers Repository
GNU General Public License v3.0
20 stars 7 forks source link

Microarray extraction zip path error on Windows #11

Closed jkingsman closed 7 months ago

jkingsman commented 10 months ago

Howdy! Thanks so much for an amazing program.

I've replicated on both Beta and Alpha v4.44.2 on Win11.

I'm trying to run a Microarray-raw extraction, in Everything mode with no other output formats. My relevant CLI output:

--- Exec: ButtonCombinedKit.sh, started @ Fri Nov  3 14:11:47 2023
+ D:/WGSExtractv4_alpha/cygwin64/usr/local/bin/bcftools.exe mpileup -B -I -C 50 -T D:/WGSExtractv4_alpha/reference/microarray/All_SNPs_hg38_ref.tab.gz -f D:/WGSExtractv4_alpha/reference/genomes/hs38d1.fna.gz -Ou D:/NG1NUBUSQA.mm2.sortdup.bqsr.cram
+ D:/WGSExtractv4_alpha/cygwin64/usr/local/bin/bcftools.exe call --ploidy-file D:/WGSExtractv4_alpha/reference/microarray/ploidy.txt -V indels -m -P 0 --threads 16 -Oz -o D:/WGSExtractv4_alpha/temp/9628/CombKit_called.vcf.gz
[mpileup] 1 samples in 1 input files
[mpileup] maximum number of reads per input file set to -d 250
+ D:/WGSExtractv4_alpha/cygwin64/usr/local/bin/tabix.exe -p vcf D:/WGSExtractv4_alpha/temp/9628/CombKit_called.vcf.gz
+ D:/WGSExtractv4_alpha/cygwin64/usr/local/bin/bcftools.exe annotate -Oz -a D:/WGSExtractv4_alpha/reference/microarray/All_SNPs_hg38_ref.tab.gz -c CHROM,POS,ID D:/WGSExtractv4_alpha/temp/9628/CombKit_called.vcf.gz
+ D:/WGSExtractv4_alpha/cygwin64/usr/local/bin/tabix.exe -p vcf D:/WGSExtractv4_alpha/temp/9628/CombKit_annotated.vcf.gz
+ D:/WGSExtractv4_alpha/cygwin64/usr/local/bin/bcftools.exe query -f '%ID       %CHROM  %POS[   %TGT]
' D:/WGSExtractv4_alpha/temp/9628/CombKit_annotated.vcf.gz -o D:/WGSExtractv4_alpha/temp/9628/CombKit_result.tab
+ D:/WGSExtractv4_alpha/cygwin64/bin/sed.exe 's/chr//; s/       M       /       MT      /g; s/\///; s/\.\.$/--/; s/TA$/AT/; s/TC$/CT/; s/TG$/GT/; s/GA$/AG/;     s/GC$/CG/; s/CA$/AC/' D:/WGSExtractv4_alpha/temp/9628/CombKit_result.tab
+ D:/WGSExtractv4_alpha/cygwin64/bin/sort.exe -t '      ' -k2,3 -V
+ D:/WGSExtractv4_alpha/cygwin64/bin/cat.exe D:/WGSExtractv4_alpha/reference/microarray/raw_file_templates/head/23andMe_V3.txt D:/WGSExtractv4_alpha/temp/9628/CombKit_result_sorted.tab
+ D:/WGSExtractv4_alpha/python/python.exe D:/WGSExtractv4_alpha/program/hg38tohg19.py D:/NG1NUBUSQA.mm2.sortdup.bqsr_CombinedKit hg38
Converting Microarray Build38 to HG19 positions to maintain compatibility...
Some positions failed to lift over (1 to AltContig, 38 not in new model)
--- Exec: LiftoverCleanup.sh, started @ Fri Nov  3 16:00:23 2023
+ D:/WGSExtractv4_alpha/cygwin64/bin/sort.exe -t '      ' -k2,3 -V D:/NG1NUBUSQA.mm2.sortdup.bqsr_CombinedKit.tmp
+ D:/WGSExtractv4_alpha/cygwin64/bin/cat.exe D:/WGSExtractv4_alpha/reference/microarray/raw_file_templates/head/23andMe_V3.txt D:/NG1NUBUSQA.mm2.sortdup.bqsr_CombinedKit-sorted.txt
--- SUCCESS:   1 seconds to run: LiftoverCleanup.sh (finished @ Fri Nov  3 16:00:24 2023
+ D:/WGSExtractv4_alpha/cygwin64/bin/zip.exe -j D:/NG1NUBUSQA.mm2.sortdup.bqsr_CombinedKit.zip D:/NG1NUBUSQA.mm2.sortdup.bqsr_CombinedKit.txt
zip I/O error: No such file or directory
zip error: Temporary file failure (D:/zismf8Vj)
--- SUCCESS: 1.8 hours to run: ButtonCombinedKit.sh (finished @ Fri Nov  3 16:00:25 2023
--- Exec: ButtonMicroarrayDNA.sh, started @ Fri Nov  3 16:00:26 2023
--- SUCCESS:   0 seconds to run: ButtonMicroarrayDNA.sh (finished @ Fri Nov  3 16:00:26 2023
Exception in Tkinter callback
Traceback (most recent call last):
  File "D:\WGSExtractv4_alpha\python\lib\tkinter\__init__.py", line 1921, in __call__
    return self.func(*args)
  File "D:\WGSExtractv4_alpha\program\microarray.py", line 541, in button_generate_selected_autosomal
    os.path.getsize(CombinedKitZIP_oFN) < minCbnKitSize:
  File "D:\WGSExtractv4_alpha\python\lib\genericpath.py", line 50, in getsize
    return os.stat(filename).st_size
FileNotFoundError: [WinError 2] The system cannot find the file specified: 'D:\\NG1NUBUSQA.mm2.sortdup.bqsr_CombinedKit.zip'

The zip it's reaching for doesn't exist anywhere due to the creation failure, which happens due to a slash issue in the output parameter:

Works (backslash in target zip name):

D:\>D:/WGSExtractv4_alpha/cygwin64/bin/zip.exe D:\NG1NUBUSQA.mm2.sortdup.bqsr_CombinedKit.zip D:/NG1NUBUSQA.mm2.sortdup.bqsr_CombinedKit.txt
  adding: D:/NG1NUBUSQA.mm2.sortdup.bqsr_CombinedKit.txt (deflated 66%)

Fails (forward slash in target zip name, as executed by script) -- temporary file error is a red herring:

D:\>D:/WGSExtractv4_alpha/cygwin64/bin/zip.exe -j D:/NG1NUBUSQA.mm2.sortdup.bqsr_CombinedKit.zip D:\NG1NUBUSQA.mm2.sortdup.bqsr_CombinedKit.txt
zip I/O error: No such file or directory
zip error: Temporary file failure (D:/zieXjnMh)

However, I do get my output NG1NUBUSQA.mm2.sortdup.bqsr_CombinedKit.txt which looks 23aM-esque.

I'm not super concerned with the error, but wanted to validate my understanding from looking through the python that the outputted .txt, on a failure at this stage, is complete/not subject to further processing (looks like this is dying on a cleanup operation). Basically, confirming that although there was an error in cleanup, this file is the last stage and suitable for cases where I want the full microarray.

Does that seem right?

Thanks for confirming!

RandyHarr commented 10 months ago

The .txt file should be complete.

Not sure why the cygwin64 zip program is failing.

Disk can always be specified as D: or /cygwin64/d -- either works when in a Linux command shell like BASH.

You either have to use only forward slashes or always double backslash with Cygwin64. The program is using only forward. Your examples seem to have mixed.

What type of slash is used depends on what command shell you are in and what program is interpreting the command line parameters (the shell, the program itself, etc). It is a real headache with WSL because you have to use DOS conventions for parameters except those passed to a Linux program. So you have to mix methods in one command line.

I will work to make the program detect the issue and fail more gracefully. If you leave the CombinedKit file there, it will be used to generate the other vendor files you request. Taking about 1 minute for each.

jkingsman commented 10 months ago

Ah, well, it appears then that the zip utility does not care for the forward slashes in the destination, at least. I was just showing the minimum viable resolution in my example; it's been a while since I've had to remember which cygwin utilities will make peace with different slashes or not haha.

Anyway, not a huge issue, just wanted to give a heads up that n=1 for the convention failing on the zip invocation.

RandyHarr commented 10 months ago

I will definitely look into it further. Just to confirm, do you have the installation on D:/WGSExtractv4_alpha and the output directory (and maybe source data files) set to D:/ ?

What has me scratching my head is I develop on Windows and the Cygwin64 environment. And so test it first and most. Have likely generated the CombinedKit over a thousand times there. And never seen the issue. I have all the past releases installed on both Win11 and Win10. Both on C: and D: drives. (There is a bug in the FASTQC Java program when run on windows where if the data files are on a different drive than the Java installation, it bombs out. So one of my regression tests ;)

FYI, the only difference between Beta, Alpha and Dev is that specific term in the release.json file. It is used to select one of the many "latest-release" json files. Which defines the lectionary of 6 other packages and their versions. So v44.2 is the same in all tracks. Just Beta and Alpha tend to lag depending on how much testing has occurred.

And you are always welcome to apply your knowledge to fixing or expanding the features :) FYI, if you turn debug on, then all BASH scripts generated are kept in the temp folder. You can view, edit, run them again yourself. Just make sure to use the cygwin BASH and not the ancient one delivered with Windows.

RandyHarr commented 10 months ago

Just as a follow-up, I took the ButtonCombinedKit.sh file from the temp/ folder and extracted the command: "C:/WGSE/Betav4.44p2/cygwin64/bin/zip.exe" -j "D:/DNA/WGS/Randy Harr/WGSEv4/60820188481027_CombinedKit.zip" "D:/DNA/WGS/Randy Harr/WGSEv4/60820188481027_CombinedKit.txt"

Running that in a Cygwin64 BASH shell on its own works just fine. I can see no issue except wondering if you ran out of disk space on D:/ when executing the final zip.exe command? I check for free disk space for major commands like BWA or SAMTOOLS SORT but not for simple items like this.

Note that it will try and run the zip program on the individual vendor files it creates as well. Did those work for you? Here is my entry from ButtonMicroarrayDNA.sh that zip's the vendor files: "C:/WGSE/Betav4.44p2/cygwin64/bin/zip.exe" -mj "D:/DNA/WGS/Randy Harr/WGSEv4/60820188481027_23andMe_V3.zip" "D:/DNA/WGS/Randy Harr/WGSEv4/60820188481027_23andMe_V3.txt"

Here is the failure message from the zip program you report which I am not familiar with: zip I/O error: No such file or directory zip error: Temporary file failure (D:/zismf8Vj)

I am guessing it is creating a temp file where the final file will be located. And simply renames that once finished. So maybe it had trouble creating the file there for some reason.

jkingsman commented 10 months ago

So sorry -- I would dig into this further but there's just been a family crisis and I'm leaving for emergency travel. I will come back to this when I'm able. Apologies for reporting and dashing off!