Open etarisal opened 6 years ago
Thanks for your interest.
I'd love to make epic usable on less common builds/genomes, as I know that is a pain point with many other callers.
All you need is a file with the chromosome/unplaced contig names in one column and the sizes in another.
For UCSC this might look like:
chr1 248956422
chr2 242193529
chr3 198295559
chr4 190214555
chr5 181538259
chr6 170805979
chr7 159345973
chrX 156040895
chr8 145138636
chr9 138394717
Then you can invoke epic with -cs <chromsizes_file>
and set the -egf
to a number like 0.8. Just setting the egf to any number will only affect the number of regions considered enriched, it will not find different regions or affect the rank order of the results. So if you are interested in the top 1k scoring regions this will work.
The egf suggestion is just a hack until I am able to get the egf info which is computationally expensive. Do you have a link to a fasta genome of your organism?
Endre
Also, do you have input/background files? epic needs that to run - just telling you upfront so you do not waste your time :)
Hi Endre!
Thanks a lot for your help!
I got confused because I was providing the fasta file with the option -gn (i thought epic need it XP) ... I just removed this option, including the chromsize_file and -egf 0.8 (indeed, this was the % of unique mapped reads reported by bowtie2), and the program is running since this morning!!!
This is the link for the salmon's genome: https://www.ncbi.nlm.nih.gov/genome/?term=txid8030[orgn] (I also included the unplaced contigs because there are many gene of my interest on this sequences)
Thanks again for your help!
Estefania
Estefania Tarifeño, PhD. Assistant professor Department of Biochemistry and Molecular Biology Faculty of Biological Sciences University of Concepción, Concepción, Chile (+56)(41)2203784
2018-04-12 3:02 GMT-03:00 Endre Bakken Stovner notifications@github.com:
Also, do you have input files? epic needs that to run.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/biocore-ntnu/epic/issues/76#issuecomment-380688556, or mute the thread https://github.com/notifications/unsubscribe-auth/AkhqHQIPNJ0lKYQvTS2V5FU4musTD5W_ks5tnu3mgaJpZM4TQzU_ .
But epic should be pretty fast. If it has been running for a long time there is something strange going on :/
Really?
It has been printing "Making duplicated bins unique by summing them. (File: count_reads_in_windows, Log level: INFO, Time: Fri, 13 Apr 2018 09:09:52 )" since yesterday.
Is there any mistake?
Thanks for your help!
Estefanía
Estefania Tarifeño, PhD. Assistant professor Department of Biochemistry and Molecular Biology Faculty of Biological Sciences University of Concepción, Concepción, Chile (+56)(41)2203784
2018-04-13 3:55 GMT-03:00 Endre Bakken Stovner notifications@github.com:
But epic should be pretty fast. If it has been running for a long time there is something strange going on :/
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/biocore-ntnu/epic/issues/76#issuecomment-381042227, or mute the thread https://github.com/notifications/unsubscribe-auth/AkhqHX6AtuCqn_6vHmziXNFj0xLrMjeiks5toEvygaJpZM4TQzU_ .
Are you able to share the files with me? Then I could debug easily. Dropbox link to endrebak85 gmail.com would work.
On Fri, Apr 13, 2018 at 2:15 PM, etarisal notifications@github.com wrote:
Really?
It has been printing "Making duplicated bins unique by summing them. (File: count_reads_in_windows, Log level: INFO, Time: Fri, 13 Apr 2018 09:09:52 )" since yesterday.
This is my script:
epic \ --treatment infected_A1.bedpe \ --control control_A1.bedpe \ --number-cores 6 \ -egf 0.8 \ --window-size 200 --gaps-allowed 3 \ --chromsizes ssa_ref_ICSASG_v2_rename_size.genome \ --bed condition_A.bed \ --outfile condition_A.csv
Is there any mistake?
Thanks for your help!
Estefanía
Estefania Tarifeño, PhD. Assistant professor Department of Biochemistry and Molecular Biology Faculty of Biological Sciences University of Concepción, Concepción, Chile (+56)(41)2203784
2018-04-13 3:55 GMT-03:00 Endre Bakken Stovner notifications@github.com:
But epic should be pretty fast. If it has been running for a long time there is something strange going on :/
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/biocore-ntnu/epic/issues/76#issuecomment-381042227, or mute the thread https://github.com/notifications/unsubscribe-auth/AkhqHX6AtuCqn_ 6vHmziXNFj0xLrMjeiks5toEvygaJpZM4TQzU_ .
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/biocore-ntnu/epic/issues/76#issuecomment-381116960, or mute the thread https://github.com/notifications/unsubscribe-auth/AQ9I0oNGBbAi_Y4kGwEnLikme60Qi3pFks5toJbOgaJpZM4TQzU_ .
Sure!!! Thank you!
I got my Dropbox full, but it should work with this google drive link. https://drive.google.com/drive/folders/17e-93XamXzhJPbtRTo1un-UZLCuyNHdp?usp=sharing
I uploaded a control and a treatment bedpe file, and also my chromosome_size file (with unplaced contigs) Let me know if you need any other information...
Thanks again for your help! :D Estefanía
Estefania Tarifeño, PhD. Assistant professor Department of Biochemistry and Molecular Biology Faculty of Biological Sciences University of Concepción, Concepción, Chile (+56)(41)2203784
2018-04-13 11:06 GMT-03:00 Endre Bakken Stovner notifications@github.com:
Are you able to share the files with me? Then I could debug easily. Dropbox link to endrebak85 gmail.com would work.
On Fri, Apr 13, 2018 at 2:15 PM, etarisal notifications@github.com wrote:
Really?
It has been printing "Making duplicated bins unique by summing them. (File: count_reads_in_windows, Log level: INFO, Time: Fri, 13 Apr 2018 09:09:52 )" since yesterday.
This is my script:
epic \ --treatment infected_A1.bedpe \ --control control_A1.bedpe \ --number-cores 6 \ -egf 0.8 \ --window-size 200 --gaps-allowed 3 \ --chromsizes ssa_ref_ICSASG_v2_rename_size.genome \ --bed condition_A.bed \ --outfile condition_A.csv
Is there any mistake?
Thanks for your help!
Estefanía
Estefania Tarifeño, PhD. Assistant professor Department of Biochemistry and Molecular Biology Faculty of Biological Sciences University of Concepción, Concepción, Chile (+56)(41)2203784
2018-04-13 3:55 GMT-03:00 Endre Bakken Stovner <notifications@github.com :
But epic should be pretty fast. If it has been running for a long time there is something strange going on :/
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/biocore-ntnu/epic/issues/76#issuecomment-381042227 , or mute the thread https://github.com/notifications/unsubscribe-auth/AkhqHX6AtuCqn_ 6vHmziXNFj0xLrMjeiks5toEvygaJpZM4TQzU_ .
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/biocore-ntnu/epic/issues/76#issuecomment-381116960, or mute the thread https://github.com/notifications/unsubscribe-auth/AQ9I0oNGBbAi_ Y4kGwEnLikme60Qi3pFks5toJbOgaJpZM4TQzU_ .
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/biocore-ntnu/epic/issues/76#issuecomment-381146273, or mute the thread https://github.com/notifications/unsubscribe-auth/AkhqHac3BuJDEo56E_lyfZ1a5luYFQVkks5toLDQgaJpZM4TQzU_ .
Thanks. I haven’t tried it on assemblies/contigs before.
Endre
On Friday, April 13, 2018, etarisal notifications@github.com wrote:
Sure!!! Thank you!
I got my Dropbox full, but it should work with this google drive link. https://drive.google.com/drive/folders/17e-93XamXzhJPbtRTo1un-UZLCuyNHdp? usp=sharing
I uploaded a control and a treatment bedpe file, and also my chromosome_size file (with unplaced contigs) Let me know if you need any other information...
Thanks again for your help! :D Estefanía
Estefania Tarifeño, PhD. Assistant professor Department of Biochemistry and Molecular Biology Faculty of Biological Sciences University of Concepción, Concepción, Chile (+56)(41)2203784
2018-04-13 11:06 GMT-03:00 Endre Bakken Stovner <notifications@github.com
:
Are you able to share the files with me? Then I could debug easily. Dropbox link to endrebak85 gmail.com would work.
On Fri, Apr 13, 2018 at 2:15 PM, etarisal notifications@github.com wrote:
Really?
It has been printing "Making duplicated bins unique by summing them. (File: count_reads_in_windows, Log level: INFO, Time: Fri, 13 Apr 2018 09:09:52 )" since yesterday.
This is my script:
epic \ --treatment infected_A1.bedpe \ --control control_A1.bedpe \ --number-cores 6 \ -egf 0.8 \ --window-size 200 --gaps-allowed 3 \ --chromsizes ssa_ref_ICSASG_v2_rename_size.genome \ --bed condition_A.bed \ --outfile condition_A.csv
Is there any mistake?
Thanks for your help!
Estefanía
Estefania Tarifeño, PhD. Assistant professor Department of Biochemistry and Molecular Biology Faculty of Biological Sciences University of Concepción, Concepción, Chile (+56)(41)2203784
2018-04-13 3:55 GMT-03:00 Endre Bakken Stovner < notifications@github.com :
But epic should be pretty fast. If it has been running for a long time there is something strange going on :/
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/biocore-ntnu/epic/issues/76# issuecomment-381042227 , or mute the thread https://github.com/notifications/unsubscribe-auth/AkhqHX6AtuCqn_ 6vHmziXNFj0xLrMjeiks5toEvygaJpZM4TQzU_ .
— You are receiving this because you commented. Reply to this email directly, view it on GitHub <https://github.com/biocore-ntnu/epic/issues/76#issuecomment-381116960 , or mute the thread https://github.com/notifications/unsubscribe-auth/AQ9I0oNGBbAi_ Y4kGwEnLikme60Qi3pFks5toJbOgaJpZM4TQzU_ .
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/biocore-ntnu/epic/issues/76#issuecomment-381146273, or mute the thread https://github.com/notifications/unsubscribe-auth/AkhqHac3BuJDEo56E_ lyfZ1a5luYFQVkks5toLDQgaJpZM4TQzU_ .
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/biocore-ntnu/epic/issues/76#issuecomment-381166027, or mute the thread https://github.com/notifications/unsubscribe-auth/AQ9I0ltXOy5N9dy3ppLWXzBQOBmF-2Rrks5toL9-gaJpZM4TQzU_ .
Ok!, maybe is the reason why is taking so long... I see that epic is at least calculating (something) and taking about 2-4 cores depending on the time I check (using top) I am running the same dataset on my computer...But let me know if you have any suggestion to speed up the run!
Thank you! Estefanía
Estefania Tarifeño, PhD. Assistant professor Department of Biochemistry and Molecular Biology Faculty of Biological Sciences University of Concepción, Concepción, Chile (+56)(41)2203784
2018-04-13 13:44 GMT-03:00 Endre Bakken Stovner notifications@github.com:
Thanks. I haven’t tried it on assemblies/contigs before.
Endre
On Friday, April 13, 2018, etarisal notifications@github.com wrote:
Sure!!! Thank you!
I got my Dropbox full, but it should work with this google drive link. https://drive.google.com/drive/folders/17e-93XamXzhJPbtRTo1un-UZLCuyNHdp ? usp=sharing
I uploaded a control and a treatment bedpe file, and also my chromosome_size file (with unplaced contigs) Let me know if you need any other information...
Thanks again for your help! :D Estefanía
Estefania Tarifeño, PhD. Assistant professor Department of Biochemistry and Molecular Biology Faculty of Biological Sciences University of Concepción, Concepción, Chile (+56)(41)2203784
2018-04-13 11:06 GMT-03:00 Endre Bakken Stovner < notifications@github.com
:
Are you able to share the files with me? Then I could debug easily. Dropbox link to endrebak85 gmail.com would work.
On Fri, Apr 13, 2018 at 2:15 PM, etarisal notifications@github.com wrote:
Really?
It has been printing "Making duplicated bins unique by summing them. (File: count_reads_in_windows, Log level: INFO, Time: Fri, 13 Apr 2018 09:09:52 )" since yesterday.
This is my script:
epic \ --treatment infected_A1.bedpe \ --control control_A1.bedpe \ --number-cores 6 \ -egf 0.8 \ --window-size 200 --gaps-allowed 3 \ --chromsizes ssa_ref_ICSASG_v2_rename_size.genome \ --bed condition_A.bed \ --outfile condition_A.csv
Is there any mistake?
Thanks for your help!
Estefanía
Estefania Tarifeño, PhD. Assistant professor Department of Biochemistry and Molecular Biology Faculty of Biological Sciences University of Concepción, Concepción, Chile (+56)(41)2203784
2018-04-13 3:55 GMT-03:00 Endre Bakken Stovner < notifications@github.com :
But epic should be pretty fast. If it has been running for a long time there is something strange going on :/
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/biocore-ntnu/epic/issues/76# issuecomment-381042227 , or mute the thread https://github.com/notifications/unsubscribe-auth/AkhqHX6AtuCqn_ 6vHmziXNFj0xLrMjeiks5toEvygaJpZM4TQzU_ .
— You are receiving this because you commented. Reply to this email directly, view it on GitHub <https://github.com/biocore-ntnu/epic/issues/76# issuecomment-381116960 , or mute the thread https://github.com/notifications/unsubscribe-auth/AQ9I0oNGBbAi_ Y4kGwEnLikme60Qi3pFks5toJbOgaJpZM4TQzU_ .
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/biocore-ntnu/epic/issues/76#issuecomment-381146273 , or mute the thread https://github.com/notifications/unsubscribe-auth/AkhqHac3BuJDEo56E_ lyfZ1a5luYFQVkks5toLDQgaJpZM4TQzU_ .
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/biocore-ntnu/epic/issues/76#issuecomment-381166027, or mute the thread https://github.com/notifications/unsubscribe-auth/ AQ9I0ltXOy5N9dy3ppLWXzBQOBmF-2Rrks5toL9-gaJpZM4TQzU_
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/biocore-ntnu/epic/issues/76#issuecomment-381193952, or mute the thread https://github.com/notifications/unsubscribe-auth/AkhqHUjCneRC_J1rwwNTXK7uK7iEXiE8ks5toNXkgaJpZM4TQzU_ .
Hmm, usual genomes have ~25 chromosomes, with your contigs you have 232155. This might be why it takes so long. Is it possible to only run it on the canonical chromosomes?
I will think more about it, I promise.
Yep! it run smoothly using only canonical chromosomes.... I will pick up the most important scaffolds in order to reduce the number of contigs on the annotations.
Thank you for your help,
Cheers,
Estefanía
Estefania Tarifeño, PhD. Assistant professor Department of Biochemistry and Molecular Biology Faculty of Biological Sciences University of Concepción, Concepción, Chile (+56)(41)2203784
2018-04-15 8:02 GMT-03:00 Endre Bakken Stovner notifications@github.com:
Hmm, usual genomes have ~25 chromosomes, with your contigs you have
- This might be why it takes so long. Is it possible to only run it on the canonical chromosomes?
I will think more about it, I promise.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/biocore-ntnu/epic/issues/76#issuecomment-381397650, or mute the thread https://github.com/notifications/unsubscribe-auth/AkhqHWZI1V69n83_wvP-7JCGTexZ_vurks5toyiygaJpZM4TQzU_ .
Hello, I am working with Chip-seq data (Histone modifications) from Salmo salar and I would like to ask you if it is possible to request the genome and genome size file for this organism?. Or any help to create include the annotation I used for the mapping in the epic run?
The genome annotation I used for the mapping is the one found at NCBI (https://www.ncbi.nlm.nih.gov/genome/369?genome_assembly_id=248466), I also include the "unplaced contigs" in this process. Thanks for your help, Cheers,
Estefania