AnimaTardeb / G4Hunter

G4Hunter (2012_2015)- IECB - Bordeaux
http://nar.oxfordjournals.org/content/44/4/1746
GNU General Public License v3.0
12 stars 10 forks source link

Recommended parameters for hg38? #5

Closed davetang closed 5 years ago

davetang commented 5 years ago

What would be the recommended parameters (window size and score) for running G4Hunter on hg38?

davetang commented 5 years ago

I tried -w 25 and -s 1 but I got this error.

python G4Hunter.py -i hg38.fa -o hg38 -w 25 -s 1
25

         Re-evaluation of G-quadruplex propensity with G4Hunter 

#####################################
#    New Results directory Created  #
#####################################

 Input file: hg38
Traceback (most recent call last):
  File "G4Hunter.py", line 296, in <module>
    ScoreListe, DNASeq, NumListe, HeaderListe=soft1.GFinder(filein, window)
  File "G4Hunter.py", line 91, in GFinder
    Sequence,liste=self.BaseScore(ListSeq[i])
  File "G4Hunter.py", line 108, in BaseScore
    liste[item]=2
IndexError: list assignment index out of range
AnimaTardeb commented 5 years ago

Hey Dave. 1-have you tried to run G4hunter on the genome provided (human mitochondrial genome)? 2- why there is 25 at the beginning of the second line of the code you provided?

davetang commented 5 years ago

Hey there.

  1. Yup I can run G4Hunter on the provided FASTA file.
  2. That's part of the output produced by G4Hunter.py.

My colleague told me that he fixed that problem by changing the code on line 107 from

if(item+1< len(line) and (line[item+1]=="G" or line[item+1]=="g")):

to

if(item+1< len(line) and (line[item+1]=="G" or line[item+1]=="g") and item<len(liste)):

Is that OK?

AnimaTardeb commented 5 years ago

It might be a solution. However, a lot of people and I have ran this script on hg38 and HG37 without any modification the problem mostly is that there might be empty characters in the file I will let you know

Edit: item is just an index that goes from 0 to len(liste). I am wondering why only in this line you added "item<len(liste)"? cause if it's a mandatory condition it should be in all the lines.

I am afraid that this will change the score calculation and you will get false results.

P.S. H.G. isa lil time consuming.

AnimaTardeb commented 5 years ago

I suggest you use w25 and s 1.5 if you want to screen the human genome

Bedrat Amina dlF. https://github.com/AnimaTardeb/

On Wed, Sep 18, 2019 at 4:10 PM Dave Tang notifications@github.com wrote:

Hey there.

  1. Yup I can run G4Hunter on the provided FASTA file.
  2. That's part of the output produced by G4Hunter.py.

My colleague told me that he fixed that problem by changing the code on line 107 from

if(item+1< len(line) and (line[item+1]=="G" or line[item+1]=="g")):

to

if(item+1< len(line) and (line[item+1]=="G" or line[item+1]=="g") and item<len(liste)):

Is that OK?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/AnimaTardeb/G4Hunter/issues/5?email_source=notifications&email_token=ACVVFYQ3GB4RDLQK6QKH3NTQKIZG5A5CNFSM4IXQZDB2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7AGKTY#issuecomment-532702543, or mute the thread https://github.com/notifications/unsubscribe-auth/ACVVFYXGQNDO76ZU6VEQFSDQKIZG5ANCNFSM4IXQZDBQ .

davetang commented 5 years ago

Thank you for updating the code and for the recommendation. Just one more issue related to the output directory.

python G4Hunter.py -i chr22.fa -w 25 -s 1.5 -o tmp
Traceback (most recent call last):
  File "G4Hunter.py", line 277, in <module>
    OPF= os.listdir(outputfile)
OSError: [Errno 2] No such file or directory: 'tmp'

mkdir tmp
python G4Hunter.py -i chr22.fa -w 25 -s 1.5 -o tmp
Traceback (most recent call last):
  File "G4Hunter.py", line 292, in <module>
    os.makedirs(outputfile+"/"+DIR+"/", mode=0777)        #
NameError: name 'DIR' is not defined

mkdir tmp/Results_chr22
python G4Hunter.py -i chr22.fa -w 25 -s 1.5 -o tmp
true Results_chr22

         Re-evaluation of G-quadruplex propensity with G4Hunter 

#####################################
#    New Results directory Created  #
#####################################

 Input file: chr22

 Results files and Score Figure are created in:
tmp / Results_chr22 / 
AnimaTardeb commented 5 years ago

Last thing for the -o just indicate where you want to put the result directory. e.g. select PATH/TO/Document and the G4hunter will create result_nameoffile inside PATH/TO/Document/Result_nameoffile

1: if you are running different files separately you will have different folders each one has a name like : Result_nameoffile1 Result_nameoffile2 ... ./G4hunter.py -i PATH/TO/FASTA -o PATH/TO/ALREADY/EXSITING/DIRECTORY -w 25 -s 1.5

davetang commented 5 years ago

Sorry, I thought the example was clear. You changed your code and it now looks for an output directory that is named: output directory + name of FASTA file without .fa. If that doesn't exist, the script will fail.

# output directory exists
mkdir tmp
python G4Hunter.py -i chr22.fa  -o tmp -w 25 -s 1.5
Traceback (most recent call last):
  File "G4Hunter.py", line 292, in <module>
    os.makedirs(outputfile+"/"+DIR+"/", mode=0777)        #
NameError: name 'DIR' is not defined

The script is looking for Results_chr22 inside of tmp, which doesn't exist. If I manually create this, the script works.

mkdir -p tmp/Results_chr22
python G4Hunter.py -i chr22.fa  -o tmp -w 25 -s 1.5
# runs fine
AnimaTardeb commented 5 years ago

I still don't know why you need to create a directory. If it's a code error I have to correcte it.

But first why are you working in your tmp folder ?

Please see my console because I didn't need to create the result directory and I could erase it and recreate automatically.

$pwd
/Users/MYSH/G4Hunter-V4/G4Github
$ls
G4Hunter.py
$./G4Hunter.py -i /Users/MYSH/G4Hunter-V4/exemple/Mitochondri.fasta -o /Users/MYSH/G4Hunter-V4/G4Github -w 25 -s 1.5

########################################################################
#                            Results directory Created                 #
########################################################################

 Input file: Mitochondri

 Results files and Score Figure are created in:   
/Users/MYSH/G4Hunter-V4/G4Github / Results_Mitochondri / 

$ls
G4Hunter.py     Results_Mitochondri
$./G4Hunter.py -i /Users/MYSH/G4Hunter-V4/exemple/Mitochondri.fasta -o /Users/MYSH/G4Hunter-V4/G4Github -w 25 -s 1.5
true Results_Mitochondri

     Re-evaluation of G-quadruplex propensity with G4Hunter 

#####################################
#    New Results directory Created  #
#####################################

 Input file: Mitochondri

 Results files and Score Figure are created in:   
/Users/MYSH/G4Hunter-V4/G4Github / Results_Mitochondri / 

$ls
G4Hunter.py     Results_Mitochondri
$
AnimaTardeb commented 5 years ago

In the folder downloaded there is no file named Mitochondri.fasta this is why you are having the error.

The Fasta file is named Mitochondria_NC_012920_1.fasta https://github.com/AnimaTardeb/G4Hunter/blob/master/Mitochondria_NC_012920_1.fasta .

Best

Bedrat Amina dlF. https://github.com/AnimaTardeb/

On Sat, Sep 21, 2019 at 3:36 AM Dave Tang notifications@github.com wrote:

Example from a fresh start.

pwd /home/dtang/github

git clone https://github.com/AnimaTardeb/G4Hunter.git && cd G4Hunter

python ./G4Hunter.py -i Mitochondri.fasta -o ~/github/G4Hunter/ -w 25 -s 1.5

########################################################################

Results directory Created

########################################################################

Traceback (most recent call last): File "./G4Hunter.py", line 301, in filein=open(inputfile,"r") IOError: [Errno 2] No such file or directory: 'Mitochondri.fasta'

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/AnimaTardeb/G4Hunter/issues/5?email_source=notifications&email_token=ACVVFYSUNLMYJZVOXL2BL6LQKV3A5A5CNFSM4IXQZDB2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7IHZRI#issuecomment-533757125, or mute the thread https://github.com/notifications/unsubscribe-auth/ACVVFYTVW55YOD43VPV6DFTQKV3A5ANCNFSM4IXQZDBQ .

davetang commented 5 years ago

Thank you for all your help.