fmalmeida / bacannot

Generic but comprehensive pipeline for prokaryotic genome annotation and interrogation with interactive reports and shiny app.
https://bacannot.readthedocs.io/en/latest/
GNU General Public License v3.0
96 stars 9 forks source link

provide a small dataset for quick code tests #46

Closed fmalmeida closed 2 years ago

fmalmeida commented 2 years ago

Provide a small dataset to quickly check the code integrity.

The pipeline already has a test dataset that enables the checkup of the majority of processes with -profile test. However, it is not too small and it takes generally ~1-2h to finish.

So, when updating code, we generally want something quicker just to check its integrity. Therefore, it would be nice to have a super small dataset that enables this in less then 40 min.

Task:

abhi18av commented 2 years ago

Agreed, this would be super helpful even on Github CI/Actions tests.

fmalmeida commented 2 years ago

Found out that a run with Haemophilus influenzae genome takes only 9 min to finish testing almost all modules, just not running the assembly modules neither the methylation calling module. But since the pipeline is properly compiled to run all the others successfully, it seems like a good dataset for quick testing the code integrity and also the modules executed.

Now, just needs to add this new test profile in develop and bring a new patch release.

fmalmeida commented 2 years ago

Has been added in develop by commit: https://github.com/fmalmeida/bacannot/commit/ed6f51a14903dcbd84b621387787892c26d08ec9

But the urls are already pointing to the master inside these configs, thus, they will only be useful when this branch is merged into the master.

After that, I'd have to understand a little bit more about github actions to make these check-ups automatic.

abhi18av commented 2 years ago

This sounds great @fmalmeida - do we have a tentative release date for the next release (or merge) ?

fmalmeida commented 2 years ago

Hi @abhi18av,

For the bigger release, which is related to issue #36 and the draft PR #44 I don't have yet a forecast, because I am not finding too much time to spent with these implementations now that the pipeline is stable. You can see that the "remodelling" branch implementation (related to issue #36) is super slow and may take a good amount of time to be finished.

However, for the smaller changes, the ones that were addressed in the issues you've contributed and are already merged in the develop ... From my latest tests, I've seen that the branch seems to be already stable. I just want to test it two more times with a few datasets of mine before bringing it forward.

For these smaller ones, I think that a new release 3.0.1 could be published within the two next weeks.

😄

abhi18av commented 2 years ago

Ah, okay this makes sense Felipe, not a problem - time is limited resource 😉

But thinking further about this, I think that maybe an nf-core/modules like approach for testing (pytest) might make more sense since we now have a smaller dataset.

This would ideally be done in conjunction with the possible refactoring of modules.

But no hurries, I think sometimes more than engineering, a pipeline (or product) needs more users to guide the overall development :)

fmalmeida commented 2 years ago

I don't know much about the pytest that nf-core executes in their pipelines/modules. But I think that it is worthy to learn anything that would make testing easier.

If you could point me out to such examples and how they are done or configured so I can try to learn more about them, it would be nice.

No worries, every input is valuable, and I am pleased about the discussions and inputs you've brought to me 😄

And yes, I agree with you, this automatic testing implementations may be done in conjunction with the refactoring of modules 😄

abhi18av commented 2 years ago

If you could point me out to such examples and how they are done or configured so I can try to learn more about them, it would be nice.

Sure, Felipe 👍

Actually, I saw this practice initiated by the nf-core folks, this is documented in the talks here

Beyond nf-core, an independent effort by Robert was done for bactopia https://github.com/bactopia/bactopia who relied on the data here https://github.com/bactopia/bactopia-tests

However if you have other ideas in mind, I'd be happy to discuss and try it out with you :)

fmalmeida commented 2 years ago

Hi @abhi18av,

Many thanks for pointing out these sources. I will surely make some efforts to read and study them.

Having knowledge on how to speed-up and automatize tests will be awesome for this repository in specific and also for my future works.

Thank you.

😁😁