StevenWingett / HiCUP

Hi-C data processing pipeline
GNU Lesser General Public License v3.0
31 stars 11 forks source link

Fix/Improve CI #85

Closed lldelisle closed 1 year ago

lldelisle commented 1 year ago

Hi, I noticed that the CI was failing because the link to fasta and bowtie2 is broken. I tried to find another one (GRCh38_noalt) but it seems it is different.

I also added simple tests to check all perl scripts are executable.

(This PR integrates #83 )

lldelisle commented 1 year ago

Youhou. It passes. I think it would be good to have more tests but this is already something to start with (you can squash all commits).

StevenWingett commented 1 year ago

Many thanks for doing this, it is very helpful!

1) Just wondering, is this commit adding the output from the test run to the HiCUP Git repo (i.e. the Binary HiCUP Digest file etc)? We don't want to add large files to the repo.

lldelisle commented 1 year ago

Yes it is but the whole directory is 4.8MB:

[ldelisle@SV-49-002 HiCUP]$ ls -alh Testing/test_output/
total 4.8M
drwxr-xr-x 2 ldelisle ldelisle 4.0K Feb 21 10:01 .
drwxr-xr-x 3 ldelisle ldelisle 4.0K Feb 21 10:01 ..
-rw-r--r-- 1 ldelisle ldelisle 216K Feb 21 10:01 Digest_Human38chr21and22_HindIII_None_13-45-10_16-02-2023.txt.gz
-rw-r--r-- 1 ldelisle ldelisle 1007 Feb 21 10:01 HiCUP_summary_report_lHWLvpbSoZ_13-54-49_16-02-2023.txt
-rw-r--r-- 1 ldelisle ldelisle  32K Feb 21 10:01 test_dataset1_2.hicup.bam
-rw-r--r-- 1 ldelisle ldelisle 4.5M Feb 21 10:01 test_dataset1_2.hicup.bam.HiCUP_summary_report_lHWLvpbSoZ_13-54-49_16-02-2023.html

And the largest is due to the html which I can remove.

StevenWingett commented 1 year ago

Hi again,

Okay, but I was wondering why we need to store these binary files.. The Digest file is made by hicup_digester after downloading the chromosome FASTA files when running the GitHub actions We don't need to store the BAM file or HTML file. All we need is the expected results in the summary text file, which is tiny. I want to keep the HiCUP repo a small as possible and free from binary files.

Sorry if I am misunderstanding this update. Could we implement the new excellent Github actions pipeline without committing binary files to the repo?

Many thanks for all your help!

Best,

Steven

lldelisle commented 1 year ago

No problem. I don't want to get the html. I thought that I could keep the bam to be able to use it to test converters. But I did not have time to implement the tests.

lldelisle commented 1 year ago

The test output can also be store somewhere else.

StevenWingett commented 1 year ago

Testing the converters is a good idea (the more unit testing the better). This possibly could be done using the BAM file that is made during Github actions processing, but if not we could even host a BAM file on a separate server if need be. What I don't want is BAM files (and other binary files) in the Git repo.

Is it possible to re-do the pull request without adding those binary files. I'd like to merge that into the development branch.

Thanks very much !

StevenWingett commented 1 year ago

Thanks! Would it be possible to remove the other binary file: Testing/test_output/Digest_Human38chr21and22_HindIII_None_13-45-10_16-02-2023.txt.gz and then I'll commit to the repo.

Thanks very much, Steven

lldelisle commented 1 year ago

Sorry, sure

lldelisle commented 1 year ago

Thanks, (you did not squashed before merging therefore the commits with the bam are still present in the repository, I hope this is not an issue)

StevenWingett commented 1 year ago

Oh dear, I clicked merge too quickly. That's just doubled the size of the repo. I'll have to put up with it as the git repo is live and deleting live commits is potentially a problem. Well, it's good to have the Unit test working again anyway.

Thanks, Steven