Closed josruirod closed 3 years ago
Hey @josruirod, I see you have the refgene_anno
file now as related to your past refgenie issue. I'm curious, did you end up manually adding the refgene_anno
asset, or did you use the build recipe, i.e. refgenie build AgamP_VB/refgene_anno --files refgene=/path/to/your_refgene_file.txt.gz
. I ask, because the build recipe should automatically generate that refgene_tss
file which is a child asset of the refgene_anno
asset. If you added it manually, it would not have produced this file. In order to produce it manually you could recreate the recipe command and produce the file manually.
gzip -dcf /path/to/your_refgene_file.txt.gz | awk '{{if($4=="+"){{print $3"\t"$5"\t"$5"\t"$13"\t.\t"$4}}else{{print $3"\t"$6"\t"$6"\t"$13"\t.\t"$4}}}}' | LC_COLLATE=C sort -k1,1 -k2,2n -u > refgene_tss.bed
Assuming you were able to recreate the refgene_anno
file based on the file schema, the columns will line up correctly for the appropriate output file to be produced.
Super, thanks.
I did recreate the refgene txt file for my organism, and used refgenie build AgamP_VB/refgene_anno -c $REFGENIE --files refgene=$folder3/our_refgene_file.txt.gz
. Why the TSS file was not created then?
Anyhow, I've been able to produce it with the line you provided, thanks. Where should I place it? Or should I give it to pepatac with --TSS-name?
And I'm curious why the peaks are not being annotated either. Is this also related to the lack of TSSs? I also built the assets gencode_gtf and ensembl_gtf with a custom gtf as you can see above with refgene list
Thanks
Do you happen to have the output from the build command still available? Could look to see if it threw any errors, because yes it should have been produced. And yes, you can just pass it with that command. Probably the simplest approach. Yeah, the peaks should be utilizing the feat_annotation
asset too, which I see you have. If you want to share a sample's PEPATAC_log.md
file here I can look over it and see if anything jumps out to me immediately. You'll need to change the extension to .txt
to share it on GitHub. Could you also share a preview of your feat_annotation
asset, and let's confirm it looks correct too.
So you could refgenie seek AgamP_VB/feat_annotation
to retrieve the path to the asset. Then just zcat -dc <that_path> | head
.
HI, thanks for all the comment. I may have done something wrong imitating the format or there's not enough info on the available annotation, and I can proceed externally to pepatac, but I'd like to confirm.
So, the output of refgenie seek AgamP_VB/feat_annotation
is an error because it's not finding it:
refgenconf.exceptions.MissingAssetError:` Genome 'AgamP_VB' exists, but asset 'feat_annotation' is missing
As you can see in my previous message, I have built AgamP_VB/refgene_anno, not AgamP_VB/feat_annotation.
I have looked with refgenie build AgamP_VB/feat_annotation -c $REFGENIE --requirements
and ensembl_gtf and ensembl_rb are required. I do have a gtf file but I'm not sure if I have an ensembl_rb. Could you point me to the differences or specs of those files?
I'm attaching the output from the refgenie build AgamP_VB/refgene_ann
, the output from zcat -dc path/to/refgene_anno | head
, and the log of the run (one failed first, but the other is almost complete except for that I'd say).
Thanks!
[PEPATAC_log_test_1120.txt] [head_refgene_anno.txt] [build_feat_anno_log.txt]
Hey, sorry I misread that. Yeah, I see from the log it successfully produced a TSS file. What does /home/ipbln/jlruiz/bin/pepatac/genomes/AgamP_VB/refgene_anno/default/AgamP_VB_TSS.bed
look like if you head that file? And then concurrently, what does refgenie seek AgamP_VB/refgene_anno.refgene_tss
return?
The ensembl_rb file is the Ensembl regulatory build file, which I believe is limited to human, mouse, and maybe drosophila. It's essentially a breakdown of known annotated genomic features in gff
format. Here's a guide to manually building your own feat_annotation
asset.
I also noted you had very low alignment rates (0.09) with this test sample. Is that expected with this sample and species?
Thanks for the answer, when I head the TSS file it seems it's good:
AgamP4_2L 181305 181305 AGAP004677 . - AgamP4_2L 186936 186936 AGAP004677 . -
And the refgenie seek
output is:
/home/ipbln/jlruiz/bin/pepatac/genomes/AgamP_VB/refgene_anno/default/AgamP_VB_TSS.bed
So it seems it's successfully created, but for some reason pepatac is not getting it?
Thanks for the guide for manually building the feat_annotation. I did that and provided it with --anno-name. What type of features should be in column 4? Maybe it's failing because I did not provided any "TSS" feature?
Regarding the low alignment, this was just some testing data. I'll try again with a fresh complete run and keep you posted
For the feat_annotation, column 4 could be really any sort of feature of interest. If you have exons and introns or maybe promoters and enhancers that would be the "name" of the feature that should be present in column 4. So you could concatenate a file of known promoters and enhancers that are in the format chr, start, end, "promoter/enhancer", ".", strand
if that makes sense.
I believe the TSS_name issue you experienced was a bug. I just released a new pipeline version 0.9.9 to help there. If you do NOT provide a --TSS-name
to the pipeline it should by default use your refgene_tss
asset, which you confirmed exists and looks correct.
Okay, sounds good regarding the alignment. I just wanted to make sure there wasn't anything to note there from a pipeline perspective.
Hi, I can confirm that the refgene_tss and the annotation of peaks, the enrichment at TSSs... seem to be working now. This is great.
Thanks for the support
👍
Hi again, I'm trying to set up pepatac with a genome not present in ensembl. Pepatac is working great, but I'm need of some final help to get it fully working. I have use refgenie to build the recipes and this is the output of
refgenie list
I think I should have everything I need. However, in a run of pepatac I'm not getting the TSS enrichment plots and the annotation of the peaks. In the log I've found the corresponding lines, such as:
But I'm not sure on what I'm missing. It's not crucial because I can do this externally, but I'd like to fully use pepatac.
Thanks for the great support!