AnimaTardeb / G4Hunter

G4Hunter (2012_2015)- IECB - Bordeaux
http://nar.oxfordjournals.org/content/44/4/1746
GNU General Public License v3.0
12 stars 10 forks source link

Python code for an implementation of G4-Hunter algorithm #2

Open JocelynSP opened 7 years ago

JocelynSP commented 7 years ago

This is an implementation of the algorithm in Bedrat, Lacroix & Mergny "Re-evaluation of G-quadruplex propensity with G4Hunter' 2016.

It merges windows to regions more sensibly than the supplied binary executable, so that regions do not overlap. Merged regions reflect the published algorithm in that terminal As and Ts are not shown. When run on the supplied Mitochondria_NC_012920_1.fasta the windows scores agree with those of the supplied binary executable. It does not currently output a Score_plot.pdf

JocelynSP commented 7 years ago

I have now matched the scoring system, so windows have the score adjusted for the length of run outside the window. This gives the same output for the file Mitochondria_NC_012920_1.fasta with window 25nts and threshold 1.5 as the original. (Except for being tab-separated instead of space-separated)

mahzer commented 6 years ago

Hi Jocelyn, Nice work! Is that possible to get the result as a BED file when using a reference genome as an input? I know it's possible with the original R script but I'd like to try the new feature that you have implemented. Thanks, MZ

AnimaTardeb commented 6 years ago

Hey,

Jocelyn did a nice work, But I still don't understand why the flanking bases disturb a lot of people although they can play an important role for the G4 folding in vitro.

As I have done a lot of experiment I needed to know all the possible bases that can play or not a role in the G4 folding.

It is nice to have a sequence of 60 bases with 3 potential G4 seq but it is also nice to know that if I separate them it is because there is between two sequences, bases that weakened the score and should be the best positions where it is possible separate the three G4s for in vitro or in vivo testing

for your Q M.Z. the BED files are like if i am not wrong

name chromosome \t start \t end

I think you can add the column chromosome name on Excel and save the file in .bed

and I never heard about an R origin G4 program I always coded the G4-hunter in python and the statistics in R

Again, Jocelyn nice code but I am not the one how favors the merging sequence but it is nice to have another version of the code

Have a nice day

Amina B

https://github.com/AnimaTardeb/

On Thu, Jan 18, 2018 at 4:16 AM, mahzer notifications@github.com wrote:

Hi Jocelyn, Nice work! Is that possible to get the result as a BED file when using a reference genome as an input? I know it's possible with the original R script but I'd like to try the new feature that you have implemented. Thanks, MZ

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/AnimaTardeb/G4-Hunter/pull/2#issuecomment-358525070, or mute the thread https://github.com/notifications/unsubscribe-auth/AKtS4l0AwxLEczKsSQtqfJo8pNS6gHjLks5tLreFgaJpZM4NJjgk .

JocelynSP commented 6 years ago

Hi Mahzer, I don't know what original R code you are referring to, do you mean the original Python / binary, or might you be on the wrong post?

I am not interested in doing more work on this script, but it would not be hard to add a bed-format output, or to convert the _merged.tsv file to bed-format. BED files have no column headers. They have 3 to 12 tab-separated fields, with chrom , start and end being required, as Amina said above. See: https://genome.ucsc.edu/FAQ/FAQformat.html#format1 In the merged.tsv file, the chrom is a section heading and would have to be written in field 1 instead; then Start and End can go in fields 2 and 3. The Sequence could go in field 4 (name), or name could just be '.' Then field 5 (score) is Score, and no other optional fields would be used Jocelyn

mahzer commented 6 years ago

Thanks, Amina and Jocelyn.

I was referring to the R scripts included in the supplementary of the paper. I did not look at all of them and thought one of them is the actual code in R.

MZ