COMBINE-lab / simpleaf

A rust framework to make using alevin-fry even simpler
BSD 3-Clause "New" or "Revised" License
45 stars 3 forks source link

Title: Issues with awk/gawk Command Errors in Step 5 of Data Processing Workflow in 10x-feature-barcode-antibody.jsonnet #107

Closed wanisajad closed 11 months ago

wanisajad commented 1 year ago

I'm facing error during the execution of the 10x-feature-barcode-antibody.jsonnet file in the Simpleaf tool (Apple M1 user). Specifically, the errors are occurring in Step 5 where I'm attempting to utilize either awk or gawk for in-place file editing.

Initially, I tried using /usr/bin/awk and subsequently /opt/homebrew/bin/gawk after encountering the first error. Here are the detailed errors:

  1. Error with awk: When I replaced awk with /usr/bin/awk, the command failed. The error message indicated that the -i option is unknown and ignored, leading to a failure in opening the file with -v.

  2. Error with gawk: To bypass the initial issue, I switched to /opt/homebrew/bin/gawk, which accepts the -i option. However, a new error emerges during the process of in-place file editing, pointing to an issue with file positioning.

How to resolve these errors to proceed with my workflow?

wanisajad commented 1 year ago

update: I am using:- simpleaf 0.14.1 piscem 0.6.2 alevin-fry 0.8.2 salmon 1.10.2

rob-p commented 1 year ago

Thanks for the report, @wanisajad! I'm pinging @DongzeHE here to help take a look at it.

DongzeHE commented 1 year ago

Hi @wanisajad could you please provide the actual error messages when using gawk? Thanks!

wanisajad commented 1 year ago

@DongzeHE. these are the last few lines where error occurs.

WARN simpleaf::simpleaf_commands::workflow: msg="/opt/homebrew/bin/gawk command at step 5 failed to exit with code 0 under the shell.
The exit status was: exit status: 2.
The stderr of the invocation was: gawk: inplace:66: (FILENAME=/Users/wanisd/simpleaf_workdir/workflow_output/gene_expression/simpleaf_quant/af_quant/alevin/quants_mat_rows.txt FNR=83480) fatal: inplace::end: fsetpos(stdout) failed (Illegal seek)."

Error: /opt/homebrew/bin/gawk command at step 5 failed to exit with code 0 under the shell.
The exit status was: exit status: 2.
The stderr of the invocation was: gawk: inplace:66: (FILENAME=/Users/wanisd/simpleaf_workdir/workflow_output/gene_expression/simpleaf_quant/af_quant/alevin/quants_mat_rows.txt FNR=83480) fatal: inplace::end: fsetpos(stdout) failed (Illegal seek)
rob-p commented 1 year ago

Any thoughts on this @DongzeHE ?

DongzeHE commented 1 year ago

Hi @wanisajad, I could not reproduce the error. gawk worked well on my M1 Mac. If possible, could you please share this file,/Users/wanisd/simpleaf_workdir/workflow_output/gene_expression/simpleaf_quant/af_quant/alevin/quants_mat_rows.txt? Thanks so much.

wanisajad commented 1 year ago

@Dongze There are two files quants_mat_rows.txt and quants_mat_rows.txt.bkp. quants_mat_rows.tx is empty. (quants_mat_rows.txt is 0 bytes, so it will not be attached. here is quants_mat_rows.txt.bkp.

On Thu, Sep 28, 2023 at 6:24 PM Dongze He @.***> wrote:

Hi @wanisajad https://github.com/wanisajad, I could not reproduce the error. gawk worked well on my M1 Mac. If possible, could you please share this file, /Users/wanisd/simpleaf_workdir/workflow_output/gene_expression/simpleaf_quant/af_quant/alevin/quants_mat_rows.txt? Thanks so much.

— Reply to this email directly, view it on GitHub https://github.com/COMBINE-lab/simpleaf/issues/107#issuecomment-1740078624, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKJOAZDVD3F2MUPM42D2YWDX4X2JBANCNFSM6AAAAAA5KEYY6M . You are receiving this because you were mentioned.Message ID: @.***>

DongzeHE commented 1 year ago

Sorry, I did not see the attachment. you can send it to my email dhe17@umd.edu. Thanks.

wanisajad commented 1 year ago

@Dongze Please check your email Thanks

On Thu, Sep 28, 2023 at 6:33 PM Dongze He @.***> wrote:

Sorry, I did not see the attachment. you can send it to my email @.*** Thanks.

— Reply to this email directly, view it on GitHub https://github.com/COMBINE-lab/simpleaf/issues/107#issuecomment-1740085842, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKJOAZCGVOP2BMBRKOTXQZTX4X3M3ANCNFSM6AAAAAA5KEYY6M . You are receiving this because you were mentioned.Message ID: @.***>

DongzeHE commented 1 year ago

Hi @wanisajad,

I tried the file you sent to me. It worked without any errors.

➜ gawk -i inplace -v inplace::suffix='.bkp' 'FNR==NR {dict[$1]=$2; next} {$1=($1 in dict) ? dict[$1] : $1}1' 3M-february-2018.txt quants_mat_rows.txt
➜  which gawk
/opt/homebrew/bin/gawk
➜  gawk --version
GNU Awk 5.2.2, API 3.2, (GNU MPFR 4.2.0-p12, GNU MP 6.2.1)

If possible, could you please run

gawk -i inplace -v inplace::suffix='.bkp' 'FNR==NR {dict[$1]=$2; next} {$1=($1 in dict) ? dict[$1] : $1}1' 3M-february-2018.txt.bkp quants_mat_rows.txt.bkp directly in your terminal?

To find the two files used in the above command, you can go to /Users/wanisd/simpleaf_workdir/workflow_output/workflow_execution_log.json, locate the json fields for step 5 (using /"Step": 10 in less for example), and the path of the two files will show up there. Notice that because we used in place, you need to add .bkp at the end of each file as the original file had been modified.

Thanks!

DongzeHE commented 1 year ago

Hi @wanisajad,

Please let me know if you still have trouble running the command. Thanks.

Best, Dongze

wanisajad commented 11 months ago

Hi Dongze, Sorry I think I am missing something. I am not able to see "quants_mat_rows.txt" or "quants_mat_rows.txt.bkp" workflow_output % ls 3M-february-2018.txt 3M-february-2018.txt.gz 3M-february-2018.txt.bkp gene_expression

DongzeHE commented 11 months ago

Hi @wanisajad,

The layout of the output folder is the same as the layout defined in the workflow. So if you go gene_expression -> simpleaf_quant -> af_quant -> alevin, you will see it.