biocore-ntnu / epic

(DEPRECATED) epic: diffuse domain ChIP-Seq caller based on SICER
http://bioepic.readthedocs.io
MIT License
31 stars 6 forks source link

salmo salar genome request #76

Open etarisal opened 6 years ago

etarisal commented 6 years ago

Hello, I am working with Chip-seq data (Histone modifications) from Salmo salar and I would like to ask you if it is possible to request the genome and genome size file for this organism?. Or any help to create include the annotation I used for the mapping in the epic run?

The genome annotation I used for the mapping is the one found at NCBI (https://www.ncbi.nlm.nih.gov/genome/369?genome_assembly_id=248466), I also include the "unplaced contigs" in this process. Thanks for your help, Cheers,

Estefania

endrebak commented 6 years ago

Thanks for your interest.

I'd love to make epic usable on less common builds/genomes, as I know that is a pain point with many other callers.

All you need is a file with the chromosome/unplaced contig names in one column and the sizes in another.

For UCSC this might look like:

chr1    248956422
chr2    242193529
chr3    198295559
chr4    190214555
chr5    181538259
chr6    170805979
chr7    159345973
chrX    156040895
chr8    145138636
chr9    138394717

Then you can invoke epic with -cs <chromsizes_file> and set the -egf to a number like 0.8. Just setting the egf to any number will only affect the number of regions considered enriched, it will not find different regions or affect the rank order of the results. So if you are interested in the top 1k scoring regions this will work.

The egf suggestion is just a hack until I am able to get the egf info which is computationally expensive. Do you have a link to a fasta genome of your organism?

Endre

endrebak commented 6 years ago

Also, do you have input/background files? epic needs that to run - just telling you upfront so you do not waste your time :)

etarisal commented 6 years ago

Hi Endre!

Thanks a lot for your help!

I got confused because I was providing the fasta file with the option -gn (i thought epic need it XP) ... I just removed this option, including the chromsize_file and -egf 0.8 (indeed, this was the % of unique mapped reads reported by bowtie2), and the program is running since this morning!!!

This is the link for the salmon's genome: https://www.ncbi.nlm.nih.gov/genome/?term=txid8030[orgn] (I also included the unplaced contigs because there are many gene of my interest on this sequences)

Thanks again for your help!

Estefania


Estefania Tarifeño, PhD. Assistant professor Department of Biochemistry and Molecular Biology Faculty of Biological Sciences University of Concepción, Concepción, Chile (+56)(41)2203784

2018-04-12 3:02 GMT-03:00 Endre Bakken Stovner notifications@github.com:

Also, do you have input files? epic needs that to run.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/biocore-ntnu/epic/issues/76#issuecomment-380688556, or mute the thread https://github.com/notifications/unsubscribe-auth/AkhqHQIPNJ0lKYQvTS2V5FU4musTD5W_ks5tnu3mgaJpZM4TQzU_ .

endrebak commented 6 years ago

But epic should be pretty fast. If it has been running for a long time there is something strange going on :/

etarisal commented 6 years ago

Really?

It has been printing "Making duplicated bins unique by summing them. (File: count_reads_in_windows, Log level: INFO, Time: Fri, 13 Apr 2018 09:09:52 )" since yesterday.

This is my script:

epic \ --treatment infected_A1.bedpe \ --control control_A1.bedpe \ --number-cores 6 \ -egf 0.8 \ --window-size 200 --gaps-allowed 3 \ --chromsizes ssa_ref_ICSASG_v2_rename_size.genome \ --bed condition_A.bed \ --outfile condition_A.csv

Is there any mistake?

Thanks for your help!

Estefanía


Estefania Tarifeño, PhD. Assistant professor Department of Biochemistry and Molecular Biology Faculty of Biological Sciences University of Concepción, Concepción, Chile (+56)(41)2203784

2018-04-13 3:55 GMT-03:00 Endre Bakken Stovner notifications@github.com:

But epic should be pretty fast. If it has been running for a long time there is something strange going on :/

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/biocore-ntnu/epic/issues/76#issuecomment-381042227, or mute the thread https://github.com/notifications/unsubscribe-auth/AkhqHX6AtuCqn_6vHmziXNFj0xLrMjeiks5toEvygaJpZM4TQzU_ .

endrebak commented 6 years ago

Are you able to share the files with me? Then I could debug easily. Dropbox link to endrebak85 gmail.com would work.

On Fri, Apr 13, 2018 at 2:15 PM, etarisal notifications@github.com wrote:

Really?

It has been printing "Making duplicated bins unique by summing them. (File: count_reads_in_windows, Log level: INFO, Time: Fri, 13 Apr 2018 09:09:52 )" since yesterday.

This is my script:

epic \ --treatment infected_A1.bedpe \ --control control_A1.bedpe \ --number-cores 6 \ -egf 0.8 \ --window-size 200 --gaps-allowed 3 \ --chromsizes ssa_ref_ICSASG_v2_rename_size.genome \ --bed condition_A.bed \ --outfile condition_A.csv

Is there any mistake?

Thanks for your help!

Estefanía


Estefania Tarifeño, PhD. Assistant professor Department of Biochemistry and Molecular Biology Faculty of Biological Sciences University of Concepción, Concepción, Chile (+56)(41)2203784

2018-04-13 3:55 GMT-03:00 Endre Bakken Stovner notifications@github.com:

But epic should be pretty fast. If it has been running for a long time there is something strange going on :/

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/biocore-ntnu/epic/issues/76#issuecomment-381042227, or mute the thread https://github.com/notifications/unsubscribe-auth/AkhqHX6AtuCqn_ 6vHmziXNFj0xLrMjeiks5toEvygaJpZM4TQzU_ .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/biocore-ntnu/epic/issues/76#issuecomment-381116960, or mute the thread https://github.com/notifications/unsubscribe-auth/AQ9I0oNGBbAi_Y4kGwEnLikme60Qi3pFks5toJbOgaJpZM4TQzU_ .

etarisal commented 6 years ago

Sure!!! Thank you!

I got my Dropbox full, but it should work with this google drive link. https://drive.google.com/drive/folders/17e-93XamXzhJPbtRTo1un-UZLCuyNHdp?usp=sharing

I uploaded a control and a treatment bedpe file, and also my chromosome_size file (with unplaced contigs) Let me know if you need any other information...

Thanks again for your help! :D Estefanía


Estefania Tarifeño, PhD. Assistant professor Department of Biochemistry and Molecular Biology Faculty of Biological Sciences University of Concepción, Concepción, Chile (+56)(41)2203784

2018-04-13 11:06 GMT-03:00 Endre Bakken Stovner notifications@github.com:

Are you able to share the files with me? Then I could debug easily. Dropbox link to endrebak85 gmail.com would work.

On Fri, Apr 13, 2018 at 2:15 PM, etarisal notifications@github.com wrote:

Really?

It has been printing "Making duplicated bins unique by summing them. (File: count_reads_in_windows, Log level: INFO, Time: Fri, 13 Apr 2018 09:09:52 )" since yesterday.

This is my script:

epic \ --treatment infected_A1.bedpe \ --control control_A1.bedpe \ --number-cores 6 \ -egf 0.8 \ --window-size 200 --gaps-allowed 3 \ --chromsizes ssa_ref_ICSASG_v2_rename_size.genome \ --bed condition_A.bed \ --outfile condition_A.csv

Is there any mistake?

Thanks for your help!

Estefanía


Estefania Tarifeño, PhD. Assistant professor Department of Biochemistry and Molecular Biology Faculty of Biological Sciences University of Concepción, Concepción, Chile (+56)(41)2203784

2018-04-13 3:55 GMT-03:00 Endre Bakken Stovner <notifications@github.com :

But epic should be pretty fast. If it has been running for a long time there is something strange going on :/

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/biocore-ntnu/epic/issues/76#issuecomment-381042227 , or mute the thread https://github.com/notifications/unsubscribe-auth/AkhqHX6AtuCqn_ 6vHmziXNFj0xLrMjeiks5toEvygaJpZM4TQzU_ .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/biocore-ntnu/epic/issues/76#issuecomment-381116960, or mute the thread https://github.com/notifications/unsubscribe-auth/AQ9I0oNGBbAi_ Y4kGwEnLikme60Qi3pFks5toJbOgaJpZM4TQzU_ .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/biocore-ntnu/epic/issues/76#issuecomment-381146273, or mute the thread https://github.com/notifications/unsubscribe-auth/AkhqHac3BuJDEo56E_lyfZ1a5luYFQVkks5toLDQgaJpZM4TQzU_ .

endrebak commented 6 years ago

Thanks. I haven’t tried it on assemblies/contigs before.

Endre

On Friday, April 13, 2018, etarisal notifications@github.com wrote:

Sure!!! Thank you!

I got my Dropbox full, but it should work with this google drive link. https://drive.google.com/drive/folders/17e-93XamXzhJPbtRTo1un-UZLCuyNHdp? usp=sharing

I uploaded a control and a treatment bedpe file, and also my chromosome_size file (with unplaced contigs) Let me know if you need any other information...

Thanks again for your help! :D Estefanía


Estefania Tarifeño, PhD. Assistant professor Department of Biochemistry and Molecular Biology Faculty of Biological Sciences University of Concepción, Concepción, Chile (+56)(41)2203784

2018-04-13 11:06 GMT-03:00 Endre Bakken Stovner <notifications@github.com

:

Are you able to share the files with me? Then I could debug easily. Dropbox link to endrebak85 gmail.com would work.

On Fri, Apr 13, 2018 at 2:15 PM, etarisal notifications@github.com wrote:

Really?

It has been printing "Making duplicated bins unique by summing them. (File: count_reads_in_windows, Log level: INFO, Time: Fri, 13 Apr 2018 09:09:52 )" since yesterday.

This is my script:

epic \ --treatment infected_A1.bedpe \ --control control_A1.bedpe \ --number-cores 6 \ -egf 0.8 \ --window-size 200 --gaps-allowed 3 \ --chromsizes ssa_ref_ICSASG_v2_rename_size.genome \ --bed condition_A.bed \ --outfile condition_A.csv

Is there any mistake?

Thanks for your help!

Estefanía


Estefania Tarifeño, PhD. Assistant professor Department of Biochemistry and Molecular Biology Faculty of Biological Sciences University of Concepción, Concepción, Chile (+56)(41)2203784

2018-04-13 3:55 GMT-03:00 Endre Bakken Stovner < notifications@github.com :

But epic should be pretty fast. If it has been running for a long time there is something strange going on :/

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/biocore-ntnu/epic/issues/76# issuecomment-381042227 , or mute the thread https://github.com/notifications/unsubscribe-auth/AkhqHX6AtuCqn_ 6vHmziXNFj0xLrMjeiks5toEvygaJpZM4TQzU_ .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub <https://github.com/biocore-ntnu/epic/issues/76#issuecomment-381116960 , or mute the thread https://github.com/notifications/unsubscribe-auth/AQ9I0oNGBbAi_ Y4kGwEnLikme60Qi3pFks5toJbOgaJpZM4TQzU_ .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/biocore-ntnu/epic/issues/76#issuecomment-381146273, or mute the thread https://github.com/notifications/unsubscribe-auth/AkhqHac3BuJDEo56E_ lyfZ1a5luYFQVkks5toLDQgaJpZM4TQzU_ .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/biocore-ntnu/epic/issues/76#issuecomment-381166027, or mute the thread https://github.com/notifications/unsubscribe-auth/AQ9I0ltXOy5N9dy3ppLWXzBQOBmF-2Rrks5toL9-gaJpZM4TQzU_ .

etarisal commented 6 years ago

Ok!, maybe is the reason why is taking so long... I see that epic is at least calculating (something) and taking about 2-4 cores depending on the time I check (using top) I am running the same dataset on my computer...But let me know if you have any suggestion to speed up the run!

Thank you! Estefanía


Estefania Tarifeño, PhD. Assistant professor Department of Biochemistry and Molecular Biology Faculty of Biological Sciences University of Concepción, Concepción, Chile (+56)(41)2203784

2018-04-13 13:44 GMT-03:00 Endre Bakken Stovner notifications@github.com:

Thanks. I haven’t tried it on assemblies/contigs before.

Endre

On Friday, April 13, 2018, etarisal notifications@github.com wrote:

Sure!!! Thank you!

I got my Dropbox full, but it should work with this google drive link. https://drive.google.com/drive/folders/17e-93XamXzhJPbtRTo1un-UZLCuyNHdp ? usp=sharing

I uploaded a control and a treatment bedpe file, and also my chromosome_size file (with unplaced contigs) Let me know if you need any other information...

Thanks again for your help! :D Estefanía


Estefania Tarifeño, PhD. Assistant professor Department of Biochemistry and Molecular Biology Faculty of Biological Sciences University of Concepción, Concepción, Chile (+56)(41)2203784

2018-04-13 11:06 GMT-03:00 Endre Bakken Stovner < notifications@github.com

:

Are you able to share the files with me? Then I could debug easily. Dropbox link to endrebak85 gmail.com would work.

On Fri, Apr 13, 2018 at 2:15 PM, etarisal notifications@github.com wrote:

Really?

It has been printing "Making duplicated bins unique by summing them. (File: count_reads_in_windows, Log level: INFO, Time: Fri, 13 Apr 2018 09:09:52 )" since yesterday.

This is my script:

epic \ --treatment infected_A1.bedpe \ --control control_A1.bedpe \ --number-cores 6 \ -egf 0.8 \ --window-size 200 --gaps-allowed 3 \ --chromsizes ssa_ref_ICSASG_v2_rename_size.genome \ --bed condition_A.bed \ --outfile condition_A.csv

Is there any mistake?

Thanks for your help!

Estefanía


Estefania Tarifeño, PhD. Assistant professor Department of Biochemistry and Molecular Biology Faculty of Biological Sciences University of Concepción, Concepción, Chile (+56)(41)2203784

2018-04-13 3:55 GMT-03:00 Endre Bakken Stovner < notifications@github.com :

But epic should be pretty fast. If it has been running for a long time there is something strange going on :/

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/biocore-ntnu/epic/issues/76# issuecomment-381042227 , or mute the thread https://github.com/notifications/unsubscribe-auth/AkhqHX6AtuCqn_ 6vHmziXNFj0xLrMjeiks5toEvygaJpZM4TQzU_ .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub <https://github.com/biocore-ntnu/epic/issues/76# issuecomment-381116960 , or mute the thread https://github.com/notifications/unsubscribe-auth/AQ9I0oNGBbAi_ Y4kGwEnLikme60Qi3pFks5toJbOgaJpZM4TQzU_ .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/biocore-ntnu/epic/issues/76#issuecomment-381146273 , or mute the thread https://github.com/notifications/unsubscribe-auth/AkhqHac3BuJDEo56E_ lyfZ1a5luYFQVkks5toLDQgaJpZM4TQzU_ .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/biocore-ntnu/epic/issues/76#issuecomment-381166027, or mute the thread https://github.com/notifications/unsubscribe-auth/ AQ9I0ltXOy5N9dy3ppLWXzBQOBmF-2Rrks5toL9-gaJpZM4TQzU_

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/biocore-ntnu/epic/issues/76#issuecomment-381193952, or mute the thread https://github.com/notifications/unsubscribe-auth/AkhqHUjCneRC_J1rwwNTXK7uK7iEXiE8ks5toNXkgaJpZM4TQzU_ .

endrebak commented 6 years ago

Hmm, usual genomes have ~25 chromosomes, with your contigs you have 232155. This might be why it takes so long. Is it possible to only run it on the canonical chromosomes?

I will think more about it, I promise.

etarisal commented 6 years ago

Yep! it run smoothly using only canonical chromosomes.... I will pick up the most important scaffolds in order to reduce the number of contigs on the annotations.

Thank you for your help,

Cheers,

Estefanía


Estefania Tarifeño, PhD. Assistant professor Department of Biochemistry and Molecular Biology Faculty of Biological Sciences University of Concepción, Concepción, Chile (+56)(41)2203784

2018-04-15 8:02 GMT-03:00 Endre Bakken Stovner notifications@github.com:

Hmm, usual genomes have ~25 chromosomes, with your contigs you have

  1. This might be why it takes so long. Is it possible to only run it on the canonical chromosomes?

I will think more about it, I promise.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/biocore-ntnu/epic/issues/76#issuecomment-381397650, or mute the thread https://github.com/notifications/unsubscribe-auth/AkhqHWZI1V69n83_wvP-7JCGTexZ_vurks5toyiygaJpZM4TQzU_ .