dputhier / pygtftk

A python package and a set of shell commands to handle GTF files
GNU General Public License v3.0
45 stars 6 forks source link

get_5p_3p_coordsusing the --more-names argument #177

Closed yaskermezli closed 2 years ago

yaskermezli commented 2 years ago

Hi,

I'm using the get_5p_3p_coords command to extract tss coordinates bed from GRCh38 gif But, using the -n merge_gene_id_name --more-names gene_biotype arguments

That only paste the "gene_biotype" string after each merge_gene_id_name (in all rows)

here an example: chr1 11868 11869 ENSG00000223972|DDX11L1|gene_biotype . + chr1 29569 29570 ENSG00000227232|WASH7P|gene_biotype . -

I don't know if it is the purpose , but form me I was expecting to get some thing like this

chr1 11868 11869 ENSG00000223972|DDX11L1|LncRNA . + chr1 29569 29570 ENSG00000227232|WASH7P|Protein_coding . -

Thank you.

dputhier commented 2 years ago

Hi Yasminia, According to the help section --name is a comma separated list: -n, --names The key(s) that should be used as name. (default: gene_id,transcript_id) So you should try : -n merge_gene_id_name,gene_biotype

--more-names is a comma-separated list of information to be added to the 'name' column of the bed file. It is supposed to be an external name (any text you want to add).

Best

yaskermezli commented 2 years ago

Ah yes ! Thak you

dputhier commented 2 years ago

Hi Yasminia, According to the help section --name is a comma separated list: -n, --names The key(s) that should be used as name. (default: gene_id,transcript_id) So you should try : -n merge_gene_id_name,gene_biotype

--more-names is a comma-separated list of information to be added to the 'name' column of the bed file. It is supposed to be an external name (any text you want to add).

Best

Le mar. 26 avr. 2022 à 14:47, Yasmina Kermezli @.***> a écrit :

Hi,

I'm using the get_5p_3p_coords command to extract tss coordinates bed from GRCh38 gif But, using the -n merge_gene_id_name --more-names gene_biotype arguments

That only paste the "gene_biotype" string after each merge_gene_id_name (in all rows)

here an example: chr1 11868 11869 ENSG00000223972|DDX11L1|gene_biotype . + chr1 29569 29570 ENSG00000227232|WASH7P|gene_biotype . -

I don't know if it is the purpose , but form me I was expecting to get some thing like this

chr1 11868 11869 ENSG00000223972|DDX11L1|LncRNA . + chr1 29569 29570 ENSG00000227232|WASH7P|Protein_coding . -

Thank you.

— Reply to this email directly, view it on GitHub https://github.com/dputhier/pygtftk/issues/177, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN7CHWVAIPLK777ZIJ5PPLVG7QW3ANCNFSM5ULXUEGQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

--

====================================================================

Denis Puthier - Maître de Conférences - HDR

Deputy director for Education of MarMaRa Institute Co-head of TGML (Transcriptomics & Genomics Platform Marseille Luminy)

laboratoire INSERM TAGC/INSERM U 1090 Parc Scientifique de Luminy case 928 163, avenue de Luminy 13288 MARSEILLE cedex 09 FRANCE Mail: @.*** Tel: (National) 04 91 82 87 31 / (International) 33 4 91 82 87 31 Fax: (National) 04 91 82 87 01 / (International) 33 4 91 82 87 01

Web: https://tagc.univ-amu.fr/en/user/645

====================================================================