Closed DrHogart closed 4 years ago
Hi Sergei @DrHogart!
Thanks for reporting the issue!
While miRNA is mostly supported for human genome, I was able to run A.thaliana miRNA analysis once: https://github.com/bcbio/bcbio-nextgen/issues/1416
To push this analysis for Drosophila, we need to create a mirbase recipe in cloudbiolinux: https://github.com/chapmanb/cloudbiolinux/tree/master/ggd-recipes/BDGP6 similarly to hg19 for human https://github.com/chapmanb/cloudbiolinux/blob/master/ggd-recipes/hg19/mirbase.yaml or A.thaliana: https://github.com/chapmanb/cloudbiolinux/blob/master/ggd-recipes/TAIR10/mirbase.yaml
And then update resources.yaml in bcbio: https://github.com/bcbio/bcbio-nextgen/blob/master/config/genomes/BDGP6-resources.yaml similarly to A.thaliana https://github.com/bcbio/bcbio-nextgen/blob/master/config/genomes/TAIR10-resources.yaml
If you could help creating and testing this recipe (making sure all files and downloaded and they correspond to the correct reference) with a pull request (PR), that would speed up the process.
Here is how to test a recipe: https://github.com/chapmanb/cloudbiolinux/blob/master/doc/hacking.md#testing-a-ggd-recipe
Sergey
Thanks for the PRs! bcbio installs srnaseq for DBGP6 now for me.
For me bcbio also installs srnaseq, but seqbuster, mirdeep2 still doesn't work... I've just realized that BDGP6 genome file is from ensembl and their chrom names are '2L', '2R' and so, while srna-transcripts.gff, mirbase.gff3 and other files from srnaseq have 'chr2L', 'chr2R' and so. Is it possible that this discrepancy can be the reason? I can check this only tomorrow.
yes, I think it is better to have chr names matching the reference. See some chromosome mapping helper scripts: https://github.com/chapmanb/cloudbiolinux/blob/master/ggd-recipes/hg19/topmed.yaml S
but seqbuster, mirdeep2 still doesn't work.
it was just the indentation bug in the config/genomes/BDGP6-resources.yaml, please see the PR. Now, for me smallRNA-seq analysis goes well.
yes, I think it is better to have chr names matching the reference.
Since smallRNA-seq generates its results correctly (as for as I understand them), I left the chrom names the same and they don't match the chrom names of the reference. Please let me know, if I should change them (e.g. to make some consistency with general rules of the repo policies or so)
Thanks @DrHogart !
I am fine with leaving chr names as is for now, since it produces the right results. In the variant analysis chr/nochr was always an issue, but in atac-seq, mirna tools may tolerate that difference. I saw Lorena was just linking recipes for grch38/hg19, so it worked before for H.sapiens. We document it here, if anybody sees any issues in BDGP6/mirna please re-open this one.
Hi, running
bcbio_nextgen.py upgrade --genomes BDGP6 --datatarget smallrna
results to theList of genomes to get (from the config file at '{'genomes': [{'dbkey': 'BDGP6', 'name': 'D melangogaster (BDGP6)', 'indexes': ['seq'], 'annotations': ['transcripts']}], 'genome_indexes': ['bwa', 'bowtie2', 'rtg', 'star'], 'install_liftover': False, 'install_uniref': False}'): D melangogaster (BDGP6)
. As you see there are no miRBase annotation files, only 'transcripts'. The same with dm3 genome. Correspondingly, there was no srnaseq folder after upgrading. So, smallRNA-seq analysis doesn't work. At the same time upgrade with smallrna datatarget for hg19 gets miRBase annotations correctly.bcbio 1.2.0
Could you please add the srnaseq data in the BDGP6 genome resources?