Packaging and folder structure

CITCOM-project / causcumber

Cucumber driven causal inference for testing computational models.

1 stars 1 forks source link

Packaging and folder structure #12

Open bobturneruk opened 3 years ago

bobturneruk commented 3 years ago

I suggest that this repo is organised as follows:

root

experiments/ (runs to refer to in a paper)
scenarios/ (specific examples)
causcumber/ (the generalisable part of the code)
setup.py (incorporates requirements)

Then to work on the project one would (from the repo root):

pip install -e .

which would install causcumber in an editable mode, allowing the generalisable parts to be imported.

This structure would facilitate pushing to PyPI.

bobturneruk commented 3 years ago

R dependencies (https://github.com/CITCOM-project/causcumber/blob/be021a51641674554927fcb5871d6575b79b2ed9/causcumber_utils.py#L211) may need special consideration

jmafoster1 commented 3 years ago

This seems sensible. In my experience, Github can get a bit annoying when you start generating gigabytes of experimental data, so maybe it would be best to keep that local for now, at least until we have the final datasets. Or it might be better to use something like ORDA and just post the lot up there when we're done.

The R dependencies are a bit of a nightmare. I had an issue with that at the weekend when I was working on my laptop. Behave captures outputs by default, so I had no idea it was waiting for me to give it permission to install the R dependencies. We only use it because Dagitty is so much faster than doWhy with the causal estimates, so it may be worth looking into the algorithm Daggity uses and reimplementing in Python, depending on how difficult/time consuming that would be.

bobturneruk commented 3 years ago

Yeah we don't want GBs data in here. Another repo may be a temporary solution. Also git-annex and DVC should maybe be explored.

jmafoster1 commented 3 years ago

DVC looks quite good, especially if we can hook it up to Google docs somehow. I think much of our "big data" will come from Bessemer, so it'd be nice to have a more efficient way of getting the data off there without having to zip it, scp it, unzip it, and then commit it somewhere.

I was also saying to Andy earlier, that I think we should keep our academic evaluation separate from the tool, so that it's simpler and more streamlined for people who subsequently want to download and actually use it to test their own models. I'm still not entirely sure what the "tool" will end up being. The main causecumber contribution so far seems to be an aggregation and application of different existing techniques.