biocore-ntnu / epic

(DEPRECATED) epic: diffuse domain ChIP-Seq caller based on SICER
http://bioepic.readthedocs.io
MIT License
31 stars 6 forks source link

cat: write error: Broken pipe #64

Closed romanhaa closed 7 years ago

romanhaa commented 7 years ago

I've doing some test runs with epic on our data and always received this error message as the first message after running epic -t ... -c ... > ...

cat: write error: Broken pipe

The job runs fine anyway and results are created without problems (as far as I can tell).

You probably need some additional information, but before knowing exactly what you need I'll give you some basic info. I'm working on a node of a computational cluster, running Debian 7 (wheezy) and cat version 8.13.

Let me know if you need any other info.

endrebak commented 7 years ago

Thanks for testing epic. If you could post the output written to the screen, including the error and the command run, that would be helpful :)

romanhaa commented 7 years ago

Of course, here you go :)

Command: epic -t /data/ChIP_H3K4me3.bed -c /data/input.bed > /data/epic_results.csv

Output:

# epic -t /data/ChIP_HK4me3.bed -c /data/input.bed # epic_version: 0.2.5, pandas_version: 0.20.3 (File: epic, Log level: INFO, Time: Thu, 07 Sep 2017 10:09:42 )
cat: write error: Broken pipe
Used first 10000 reads of /data/ChIP_HK4me3.bed to estimate a median read length of 51.0
Mean readlength: 51.0018, max readlength: 54, min readlength: 50. (File: find_readlength, Log level: INFO, Time: Thu, 07 Sep 2017 10:09:42 )
Using an effective genome fraction of 0.863088714089. (File: genomes, Log level: INFO, Time: Thu, 07 Sep 2017 10:09:42 )
Binning /data/PGP/ChIP_HK4me3.bed (File: run_epic, Log level: INFO, Time: Thu, 07 Sep 2017 10:09:42 )
Binning chromosomes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, M, X, Y (File: count_reads_in_windows, Log level: INFO, Time: Thu, 07 Sep 2017 10:09:42 )
Merging the bins on both strands per chromosome. (File: count_reads_in_windows, Log level: INFO, Time: Thu, 07 Sep 2017 10:13:11 )
Binning /data/PGP/input.bed (File: run_epic, Log level: INFO, Time: Thu, 07 Sep 2017 10:13:50 )
Binning chromosomes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, M, X, Y (File: count_reads_in_windows, Log level: INFO, Time: Thu, 07 Sep 2017 10:13:50 )
Merging the bins on both strands per chromosome. (File: count_reads_in_windows, Log level: INFO, Time: Thu, 07 Sep 2017 10:15:19 )
Merging ChIP_HK4me3 and Input data. (File: helper_functions, Log level: INFO, Time: Thu, 07 Sep 2017 10:15:57 )
2671858539.0 effective_genome_fraction (File: compute_background_probabilites, Log level: DEBUG, Time: Thu, 07 Sep 2017 10:16:22 )
200 window size (File: compute_background_probabilites, Log level: DEBUG, Time: Thu, 07 Sep 2017 10:16:22 )
31284221 total ChIP_HK4me3 count (File: compute_background_probabilites, Log level: DEBUG, Time: Thu, 07 Sep 2017 10:16:22 )
2.34175728568 average_window_readcount (File: compute_background_probabilites, Log level: DEBUG, Time: Thu, 07 Sep 2017 10:16:22 )
5 island_enriched_threshold (File: compute_background_probabilites, Log level: DEBUG, Time: Thu, 07 Sep 2017 10:16:22 )
3.49853429423 gap_contribution (File: compute_background_probabilites, Log level: DEBUG, Time: Thu, 07 Sep 2017 10:16:22 )
0.475623459198 boundary_contribution (File: compute_background_probabilites, Log level: DEBUG, Time: Thu, 07 Sep 2017 10:16:22 )
Finding the score required to consider an island enriched. (File: compute_score_threshold, Log level: INFO, Time: Thu, 07 Sep 2017 10:16:22 )
Computing cumulative distribution. (File: compute_score_threshold, Log level: DEBUG, Time: Thu, 07 Sep 2017 10:16:29 )
Enriched score threshold for islands: 22.162 (File: compute_score_threshold, Log level: INFO, Time: Thu, 07 Sep 2017 10:16:29 )
Giving bins poisson score. (File: count_to_pvalue, Log level: INFO, Time: Thu, 07 Sep 2017 10:16:29 )
Clustering bins into islands. (File: find_islands, Log level: INFO, Time: Thu, 07 Sep 2017 10:16:36 )
Done finding islands. (File: run_epic, Log level: INFO, Time: Thu, 07 Sep 2017 10:20:27 )
Concating dfs. (File: run_epic, Log level: INFO, Time: Thu, 07 Sep 2017 10:20:27 )
Labeling island bins. (File: run_epic, Log level: INFO, Time: Thu, 07 Sep 2017 10:20:27 )
Computing FDR. (File: run_epic, Log level: INFO, Time: Thu, 07 Sep 2017 10:20:27 )
endrebak commented 7 years ago

It doesn't matter, it is just your shell complaining about the command cat /data/ChIP_H3K4me3.bed | head -10000 for some reason. Perhaps you can reproduce the error by trying

cat /data/ChIP_H3K4me3.bed | head -10000 > /dev/null?

romanhaa commented 7 years ago

That actually doesn't return an error, but anyway good to know that it's not a problem of the tool. We are having some unexpected issues with our system every now and then so it wouldn't be very surprising to me if the problem is on our end :)

Since I'm already here, I noticed that epic finds ~50,000 peaks for our H3K4me3 data sets, which is quite a high number. I spoke with some colleagues and they usually have around 25,000 in their samples. It could be a sample quality issue though since we are preparing the ChIP on highly fixed samples and can see the difference in quality compared to fresh tissue. Would you recommend epic for H3K4me3 anyway? Or is the signal not broad enough?

endrebak commented 7 years ago

I often use MACS2 for H3K4me3. Depends on the dataset.

On Thu, Sep 7, 2017 at 2:16 PM, romanhaa notifications@github.com wrote:

That actually doesn't return an error, but anyway good to know that it's not a problem of the tool. We are having some unexpected issues with our system every now and then so it wouldn't be very surprising to me if the problem is on our end :)

Since I'm already here, I noticed that epic finds ~50,000 peaks for our H3K4me3 data sets, which is quite a high number. I spoke with some colleagues and they usually have around 25,000 in their samples. It could be a sample quality issue though since we are preparing the ChIP on highly fixed samples and can see the difference in quality compared to fresh tissue. Would you recommend epic for H3K4me3 anyway? Or is the signal not broad enough?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/biocore-ntnu/epic/issues/64#issuecomment-327781813, or mute the thread https://github.com/notifications/unsubscribe-auth/AQ9I0pcP4rAdYSzvul67Olg1pkgOvarAks5sf96fgaJpZM4PPg0f .

romanhaa commented 7 years ago

Ok, thanks a lot and keep up the good work :) Peak calling with epic really is much faster compared to SICER!

endrebak commented 7 years ago

Thanks. I mostly wanted st that was easier to use, but the speedup is nice. If you had used the -cpu flag your analyses would have been done a lot quicker, the speedup is almost linear per core IIRC.