comprna / MoSEA

Motif Scan and Enrichment Analysis (MoSEA)
ISC License
16 stars 10 forks source link

variable event support #1

Open PeterVenhuizen opened 6 years ago

PeterVenhuizen commented 6 years ago

Will variable SUPPA events be supported?

EduEyras commented 6 years ago

Hi Peter,

Variable SUPPA events are supported. We did some benchmarking with RT-PCR experiments in plants and they work better than using the strict boundaries. JC might be able to tell you more about it.

Please, let me know if you see any issues with the variable events.

Thanks

Eduardo

On Tue, Jan 9, 2018 at 4:13 PM, PeterVenhuizen notifications@github.com wrote:

Will variable SUPPA events be supported?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/comprna/MoSEA/issues/1, or mute the thread https://github.com/notifications/unsubscribe-auth/AMWVB-dOQ4b9tEvHRQuZbRSWV9H5R8H_ks5tI4IIgaJpZM4RX-Xb .

-- Dr E Eyras

ICREA Research Professor Universitat Pompeu Fabra PRBB, Dr Aiguader 88 Tel: +34 93 316 0502 (ext 1502) E08003 Barcelona, Spain Fax: +34 93 316 0550

http://scholar.google.com/citations?user=LiojlGoAAAAJ http://www.researcherid.com/rid/L-1053-2014 http://regulatorygenomics.upf.edu/

PeterVenhuizen commented 6 years ago

Hi Eduardo,

as of now I cannot use the suppa_to_bed.py script to generate the bed files for the RI events, because it expects 4 coordinates in the event_id, but the variable RI event_ids only contain the coordinates of the retained intron. Running the suppa_to_bed.py with variable events thus gives me IndexErrors. I assume that I could run MoSEA if I generate the bed files myself, but I was wondering whether the suppa_to_bed.py script would be updated to support variable events.

Best Peter

EduEyras commented 6 years ago

Thanks for pointing this out. We did not develop this part yet.

Yes, you can generate bed files from the event coordinates to run MoSEA. MoSEA can read the standard events, but not yet the variable-boundary notation, but it can also read any bed file.

Thanks

Eduardo

On Wed, Jan 10, 2018 at 8:37 AM, PeterVenhuizen notifications@github.com wrote:

Hi Eduardo,

as of now I cannot use the suppa_to_bed.py script to generate the bed files for the RI events, because it expects 4 coordinates in the event_id, but the variable RI event_ids only contain the coordinates of the retained intron. Running the suppa_to_bed.py with variable events thus gives me IndexErrors. I assume that I could run MoSEA if I generate the bed files myself, but I was wondering whether the suppa_to_bed.py script would be updated to support variable events.

Best Peter

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/comprna/MoSEA/issues/1#issuecomment-356523328, or mute the thread https://github.com/notifications/unsubscribe-auth/AMWVBwt2B1MqBf-s4Yo8y6Cae-m3J2DKks5tJGi0gaJpZM4RX-Xb .

-- Dr E Eyras

ICREA Research Professor Universitat Pompeu Fabra PRBB, Dr Aiguader 88 Tel: +34 93 316 0502 (ext 1502) E08003 Barcelona, Spain Fax: +34 93 316 0550

http://scholar.google.com/citations?user=LiojlGoAAAAJ http://www.researcherid.com/rid/L-1053-2014 http://regulatorygenomics.upf.edu/

PeterVenhuizen commented 6 years ago

I've written an updated version of the fun_RI_bedfile function, which supports variable events. I have thus far not extensively tested it, but it is able to generate the V1, V2, and V3 coordinates from the variable RI events. However, I think it will break if an variable RI event is given, but no ext_len, I have not yet tested for this.

The updated function is below. Feel free to use it.

def fun_RI_bedfile(in_file, out_file, len_ext, event, mediandiff):
    '''
    ##ref: see Fig.2 suppa documentaion of RI e1 & s1..descriptions
    #https://bitbucket.org/regulatorygenomicsupf/suppa
    #Example event id: 
    #TIAL1|7073;RI:chr10:121336123:121336262-121336592:121336715:-
    '''

    fo = open(out_file, "a")

    variable = False
    ev_all = event.split(';')[1].split(':')
    if len(ev_all) == 6: #strict
        ev_type, ev_chr, s1, e1_s2, e2, ev_strand = event.split(';')[1].split(':')

        s1 = int(s1)
        e1, s2 = map(int, e1_s2.split('-'))
        e2 = int(e2)

    else: #variable
        ev_type, ev_chr, e1_s2, ev_strand = event.split(';')[1].split(':')
        variable = True

    e1, s2 = map(int, e1_s2.split('-'))

    if ev_strand == '+':
        V2 = "{}\t{}".format(e1, s2) 
        if len_ext or variable:
            V1 = "{}\t{}".format(e1 - len_ext, e1) 
            V3 = "{}\t{}".format(s2, s2 + len_ext)
        else: 
            V1 = "{}\t{}".format(s1, e1) 
            V3 = "{}\t{}".format(s2, e2) 

    elif ev_strand == "-":
        V2 = "{}\t{}".format(e1, s2) 
        s2 = s2 -1 #for 0-based correction

        if len_ext or variable:
            V1 = "{}\t{}".format(s2, s2 + len_ext)
            V3 = "{}\t{}".format(e1 - len_ext, e1)
        else:
            V1 = "{}\t{}".format(s2, e2) 
            V3 = "{}\t{}".format(s1, e1) 

    else:
        print("No strand information in file, for event: {}".format(event))
        sys.exit()

    fo.write("{}\t{}\t{};V1\t{}\t{}\n".format(ev_chr, V1, event, mediandiff, ev_strand))
    fo.write("{}\t{}\t{};V2\t{}\t{}\n".format(ev_chr, V2, event, mediandiff, ev_strand))
    fo.write("{}\t{}\t{};V3\t{}\t{}\n".format(ev_chr, V3, event, mediandiff, ev_strand))

    fo.close()