adamewing / methylartist

Tools for plotting methylation data in various ways
MIT License
131 stars 10 forks source link

Question regarding name of chromosome #75

Open OceaneMion opened 6 months ago

OceaneMion commented 6 months ago

Hi sorry to bother you I'm trying to use methylartist for my data, but I am encountering some problems.

First I have my chromosomes named as contig_1, contig_2 ect... in all my data is it a problem and do I need to rename them like chr1, chr2 ect...?

Second I am using a gff format from RepeatMasker (tools to annotate transposable elements) and I have this kind of data : contig_1 RepeatMasker dispersed_repeat 21700 21913 3.3 + . ID=1;Target "Motif:RM2_rnd-1_family-0" 1666 1879 Do I need to rename anything here like maybe the ID do I need to rename it gene_id ?

Third I have done Dorado modified basecalling is it possible to use directly the ubam output of dorado to generate the methylartist region command or does it need to be aligned beforehand to a reference genome? What would then be the command lines ?

Thanks in advance, and have a nice day

adamewing commented 6 months ago

Hi there, as long as the chromosome names are use consistenly across any input files you're using you should be fine (let me know if it isn't!)

The second thing, I think you sent me more details in an email, is the gff file indexed via bgzip and tabix?

An alternative to the gff file would be to use a .bed file via the --bed option, this takes a BED3+1 format i.e. chromosome, start, end, name and will plot entries as rectangles on the gene track.

Not sure what you mean by Dorado ubam but guessing you mean an unmapped bam file? Best would be to map it and use it as input via -b/--bam. Hard to give an exact command line but dorado will use minimap2 to do the mapping if you give it a reference genome via --reference. If you don't want to re-call everything you could map the .bam file (convert to fastq, map with minimap2) and merge the alignments via Picard MergeBamAlignment https://gatk.broadinstitute.org/hc/en-us/articles/360057439611-MergeBamAlignment-Picard

OceaneMion commented 6 months ago

Thanks I will try with a bedfile, but the bed option is only available for methylartist locus is there a way to implement it for region ?

adamewing commented 6 months ago

That's a good point, will implement bed input in region.

OceaneMion commented 6 months ago

Thanks a lot ! It worked using a bed input for methylartist locus. Also as I'm working on transposable elements do you know if there is a way to not display on the plot the name of the gene/transposable element and only plot the corresponding rectangle.

This is the output I have, is it possible to put like a legend on the right side for the annotation of the transposable element ? I think it will be a lot more pleasing aesthetically : aligned_sort_Gd293 contig_19_1970040_2060127 hm ms3 smw84 locus meth

Also I would like to know what is the reference slinding window size that you use for methylartist plot ?

adamewing commented 5 months ago

If you pull the latest (d2067f1) there is now a --hidebedlabel for both region and locus, and region supports the --bed flag.

adamewing commented 5 months ago

For the other questions, it should be possible to make a legend for different categories from a .bed file though it might take awhile to get to, in the meantime you can specify the colour of a segment in the .bed in column 6 (chrom, start, end, label, strand, colour) and could then construct a legend in a vector graphics program (illustrator or similar) - not ideal I know.

When you say "reference sliding window size" do you mean the automatic setting for -s/--smoothwindowsize?

senukobla commented 5 months ago

I am having issue with the "chromosome annotation also . When I view bam with bedtools this is what I see viewofchr but when I run the command below I get an error

2024-06-11 17:59:49,101 starting methylartist with command: /users/PES0842/bebersole/.local/bin/methylartist locus -d methartBC01.txt -i NC_000022.11:c22559265-22547701 Traceback (most recent call last): File "/users/PES0842/bebersole/.local/bin/methylartist", line 5755, in main() File "/users/PES0842/bebersole/.local/bin/methylartist", line 5427, in main args.func(args) File "/users/PES0842/bebersole/.local/bin/methylartist", line 2943, in locus elt_start = int(elt_start.replace(',','')) ValueError: invalid literal for int() with base 10: 'c22559265'

It appears the "c" seems to be causing and issue in my mind because it is expecting an integer . Am I correct ?

adamewing commented 5 months ago

Yes, coordinates need to be integers, is there a reason you need the "c" character in there?