bartongroup / Simpson_Barton_Nanopore_1

jupyter notebooks for Parker et al. eLife 2019
MIT License
7 stars 4 forks source link

Request arabidopsis_thaliana_gene.tandem_gene_loci.txt as blacklist #2

Closed weir12 closed 4 years ago

weir12 commented 4 years ago

Hi: Sorry for disturbing you again. I have noticed that A blacklist(contains tandem genes of tair) is required for filtering false positives during chimeric RNA detection in pipeline.

  1. Unfortunately, the database PTGBase mentioned in your article is currently unavailable.Sincerely hope you can provide blacklist for me.
  2. In addition,does Araport11_GFF3_genes_transposons.201606.no_chr.gtf mean GTF with the first column(seq_id) removed ?

Thanks again weir

mparker2 commented 4 years ago

Hi @weir12

no worries, here is a link to download the PTGBase listed genes. It is a shame that the website is down. I may upload the files to the github repo as well:

https://drive.google.com/open?id=1RWhZo8-tTEZ8L4o87X1tW4rGyzMZvvvj

The annotation as downloaded from the Araport website had chromosomes labelled Chr1 Chr2 Chr3... ChrC ChrM whereas the TAIR10 fasta file had chromosome names 1, 2, 3... Mt, Pt. I renamed the contigs in the Araport GTF to match. So it depends where you got your reference sequences from.

weir12 commented 4 years ago

thanks a lot ! By the way ,Is the bam file required by thechimera_pipeline the output of this rule? https://github.com/bartongroup/Simpson_Barton_Nanopore_1/blob/1b509454a9e25a8c81be5092f8e525ca00e7b5a5/pipeline/nanoPARE_pipeline/Snakefile#L101

mparker2 commented 4 years ago

no the nanoPARE pipeline is for the analysis of nanoPARE 5' tag data from Schon et al. 2018 which we use to benchmark the adapter filtered full-length reads in figure 5 of the eLife paper. You should not use STAR to map nanopore DRS reads, use minimap2. The rules for this are in the basecalling pipeline.