ababaian / LIONS

LIONS is a bioinformatic analysis pipeline which brings together a few pieces of software and some home-brewed scripts to annotate a paired-end RNAseq library to detect TE-intiated transcripts
GNU General Public License v3.0
27 stars 13 forks source link

Fail to parse input.list #15

Closed Alex-Nesta closed 4 years ago

Alex-Nesta commented 4 years ago

Hi again,

I am unable to get lions to accept my input.list file. I have tried manually entering the tabs and entering in three separate columns in excel. I get the same error each time.

Here is my input.list file:

Here is the error:

awk: /projects/beck-lab/alex/bin/LIONS/scripts/Initialize/input.list.awk:16:   a[s][t] = $0; 
awk: /projects/beck-lab/alex/bin/LIONS/scripts/Initialize/input.list.awk:16:       ^ syntax error
awk: /projects/beck-lab/alex/bin/LIONS/scripts/Initialize/input.list.awk:21:     for ( j in a[i]) {
awk: /projects/beck-lab/alex/bin/LIONS/scripts/Initialize/input.list.awk:21:                 ^ syntax error
awk: /projects/beck-lab/alex/bin/LIONS/scripts/Initialize/input.list.awk:22:       print a[i][j]
awk: /projects/beck-lab/alex/bin/LIONS/scripts/Initialize/input.list.awk:22:                 ^ syntax error

This happens even using the default input.list file included with LIONS.

gm12878 /LIONS-data/GM12878.rep1.R1.fastq.gz,/LIONS-data/GM12878.rep1.R2.fastq.gz   1
k562    /LIONS-data/K562.rep1.R1.fastq.gz,/LIONS-data/K562.rep1.R2.fastq.gz 2

Not sure if it is due to a recent change in the code? I checked out the latest revision yesterday and installed on top of my existing one by copy and paste overwriting. Maybe that was a bad idea. I can try a clean install tomorrow.

EDIT: Same issue on fresh install. I commented on the commit where the input.list parsing last changed. Hoping you have time to look into this soon. For now I will try to check out an older commit.

biscuit13161 commented 4 years ago

Hi Alex,

Can you let me know what version of awk and what operating system you are using?

Can you send me the input.list.awk and input.list files you're using?

Best, Richard

ababaian commented 4 years ago

Having a bit of trouble reproducing as well, can you provide more info Alex?

Alex-Nesta commented 4 years ago

Hey guys,

Thanks for looking into this. I am running LIONS on my compute cluster, so I am not using Docker.

Here is my awk and linux version:


(base) [nestaa@helix LIONS]$ awk --version
GNU Awk 3.1.7
Copyright (C) 1989, 1991-2009 Free Software Foundation.

(base) [nestaa@helix LIONS]$ uname -r
2.6.32-754.3.5.el6.x86_64

Sorry, I had to compress the input.list file to keep it totally untouched and still able to upload. It's literally the git cloned version though. I did NOTHING to it.

input.list.zip

EDIT: I just tried updating to the latest version of awk... No change in the error.


(base) [nestaa@helix gawk-5.0.1]$ ./gawk --version
GNU Awk 5.0.1, API: 2.0
Copyright (C) 1989, 1991-2019 Free Software Foundation.

Another EDIT:

I checked out an older branch of LIONS and it is working fine for me. I'll be happy to test any revisions you make though.


(base) [nestaa@helix bin]$ git clone https://github.com/ababaian/LIONS.git
Initialized empty Git repository in /projects/beck-lab/alex/bin/LIONS/.git/
remote: Enumerating objects: 19, done.
remote: Counting objects: 100% (19/19), done.
remote: Compressing objects: 100% (12/12), done.
remote: Total 947 (delta 8), reused 18 (delta 7), pack-reused 928
Receiving objects: 100% (947/947), 21.94 MiB | 20.99 MiB/s, done.
Resolving deltas: 100% (532/532), done.
(base) [nestaa@helix bin]$ cd LIONS
(base) [nestaa@helix LIONS]$ git checkout -b branchtofixawak 62e2f065bd10a0b6672efb2217d0d4df6fe63f58
Switched to a new branch 'branchtofixawak'

Now, Lions can correctly parse the input.list file (I don't have these files, so obviously it throws an error):


 ---------- Set-up Project Workspace ---------- 
 Initializing sandbox Directory: /projects/beck-lab/alex/bin/LIONS/projects/sandbox
/projects/beck-lab/alex/bin/LIONS/controls/input.list /projects/beck-lab/alex/bin/LIONS/controls/parameter.ctrl
 ERROR 7B: Input File Not Accesible (Fastq)
 files: /LIONS-data/GM12878.rep1.R1.fastq.gz ; /LIONS-data/GM12878.rep1.R2.fastq.gz
  a) The non-bam input file (.fq_1 & .fq_2) isnt found
  b) If youre using FASTQ; make sure youre listing two
     files in the input.list file seperated by a comma
biscuit13161 commented 4 years ago

Hi Alex,

The latest version works for me in gawk (I'm using both v4.0.1 and v5.0.1) ... Awk (on the mac) does seem to be an issue, but this file (input.list.awk.gz) should correct that. Please try it with the latest version and let me know how it goes.

FYI, the reason the earlier version is working is that it doesn't use awk ... the input.list.awk script was added to correct a bug relating to working with replicates.

thanks, Richard

Alex-Nesta commented 4 years ago

Hi Richard,

I just tested with the new input.list.awk file you provided. It looks like it is working! Thank you!

For your reference, here is the old input.list.awk that did not work. input.list.awk.bak.zip