WangHYLab / fcirc

a python pipeline for linear and circular RNAs of known fusions exploration
MIT License
1 stars 1 forks source link

The code is not compatible for other gtf #2

Closed gnilihzeux closed 3 years ago

gnilihzeux commented 3 years ago

Dear author, I'm very interested in your program. But your code is still not robust, especially for custom genome info. I'd found a bug in your build_graph.py, which is

chr,Type,start,end,strand,gene=Line[0],Line[2],Line[3],Line[4],Line[6],Line[-1].split(';')[2].replace('gene_name "','').replace('"','').strip()

If gene_name is not the 2nd, then nothing will work anymore.

The programs about f-circRNA is very rare. Wish your updates. Thanks.

gnilihzeux commented 3 years ago

I'm not familiar to Python, but have a try

[name for name in Line[-1].split(';') if 'gene_name' in name][0].replace('gene_name "','').replace('"','').strip()
zhixue commented 3 years ago

Dear author, I'm very interested in your program. But your code is still not robust, especially for custom genome info. I'd found a bug in your build_graph.py, which is

chr,Type,start,end,strand,gene=Line[0],Line[2],Line[3],Line[4],Line[6],Line[-1].split(';')[2].replace('gene_name "','').replace('"','').strip()

If gene_name is not the 2nd, then nothing will work anymore.

The programs about f-circRNA is very rare. Wish your updates. Thanks.

Thank you for using Fcirc . I am one of authors of the project and this is my undergraduate thesis design. Although I have graduated, i can give you some suggestion to solve this issue.

You can add this function at the head of file (and it can also read gff attribute with default).

def string2dict(long_string, sep=';', eq='=', rm_quote=False):
    if rm_quote:
        long_string = long_string.replace('"', '').replace("'", '')
    long_string = long_string.replace('; ', ';')
    out_dict = dict()
    tmp = long_string.rstrip(sep).split(sep)
    for i in tmp:
        key, value = i.split(eq)
        out_dict[key] = value
    return out_dict

In the middle of file, replace the error line as following:

chr,Type,start,end,strand=Line[0],Line[2],Line[3],Line[4],Line[6]
att = string2dict(Line[-1],sep=';',eq=' ',rm_quote=True)

if "gene_name" in att:
    gene = att['gene_name']
elif "gene_id" in att:
    gene = att['gene_id']

you can test this in the python3 console:

def string2dict(long_string, sep=';', eq='=', rm_quote=False):
    if rm_quote:
        long_string = long_string.replace('"', '').replace("'", '')
    long_string = long_string.replace('; ', ';')
    out_dict = dict()
    tmp = long_string.rstrip(sep).split(sep)
    for i in tmp:
        key, value = i.split(eq)
        out_dict[key] = value
    return out_dict

string_att = 'transcript_id "ENSGXXXXXX.1"; gene_id "ENSGXXXXXX"; gene_name "TP53";'
att = string2dict(string_att, sep=';',eq=' ',rm_quote=True)

if "gene_name" in att:
    gene = att['gene_name']
elif "gene_id" in att:
    gene = att['gene_id']

print(gene)
# TP53
gnilihzeux commented 3 years ago

Thanks a lot.