RMI-PACTA / pacta.data.preparation

The goal of {pacta.data.preparation} is to prepare and format all input datasets required to run the PACTA for investors tools.
https://rmi-pacta.github.io/pacta.data.preparation/
Other
1 stars 0 forks source link

Make this repo public #9

Closed jdhoffa closed 4 months ago

jdhoffa commented 1 year ago

if this lands first, that should put a fire under this https://github.com/RMI-PACTA/pacta.scenario.preparation/issues/123

this will facilitate https://github.com/RMI-PACTA/workflow.data.preparation/issues/102

AB#10435

cjyetman commented 7 months ago

This might be achievable once RMI-PACTA/archive.pacta.data.preparation#321 is resolved.

AlexAxthelm commented 6 months ago

With RMI-PACTA/archive.pacta.data.preparation#335 and RMI-PACTA/archive.pacta.data.preparation#336 closed, and RMI-PACTA/archive.pacta.data.preparation#338 waiting on merge, it's probably time to revive this thread 🥳

cjyetman commented 6 months ago

yup! I'd appreciate thorough consideration of this from @AlexAxthelm and @jdhoffa now that (I think) we've gotten rid of any proprietary FactSet stuff.

I guess a show of consensus here in this issue is adequate, after which I'll flip the switch in the settings.

jdhoffa commented 6 months ago

Well, given that the FacSet data will forever live in the git history, I guess "flipping the switch" isn't really an option? Unless you scrub the history?

@AlexAxthelm thoughts?

cjyetman commented 6 months ago

Well, given that the FacSet data will forever live in the git history, I guess "flipping the switch" isn't really an option? Unless you scrub the history?

@AlexAxthelm thoughts?

true.... sorry, I should have said that my memory of previous conversations was that we were already keeping this private out of an abundance of caution, but the possibly sensitive data from FactSet was actually very minimal and not too big of a concern, and therefore we thought just getting it off main would be adequate

but also willing to go the "proper" route if that is what is currently preferred

jdhoffa commented 6 months ago

Yes, I also agree that the data LIKELY is fine to be public, and we are being very cautious.

I just wanted to make it clear that if we set the setting to public, we are still exposing that dataset, so we need to ensure we are comfortable with that.

Personally, I am comfortable with that 👍

cjyetman commented 6 months ago

Yes, I also agree that the data LIKELY is fine to be public, and we are being very cautious.

I just wanted to make it clear that if we set the setting to public, we are still exposing that dataset, so we need to ensure we are comfortable with that.

Personally, I am comfortable with that 👍

Thanks for making it explicit in this conversation

AlexAxthelm commented 6 months ago

To be clear about the options (and summarize prior discussions about similar repos):

Of these the first is the simplest, but does leave the pesky sensitive history available to anyone who wants to inspect. The last option is potentially attractive because most of the history is still available, but would still break any commit-specific references (links, tags can be retagged) since all the SHAs would change (and any signed commits become unsigned, but we don't worry too much about that)

I might be overly cautious, but I'm inclined to go with the second option, and put the current state in pacta.data.preparation.archive or something like that, and start fresh.

cc @hodie for input.

jdhoffa commented 6 months ago

A glorious tech review topic!

AlexAxthelm commented 6 months ago

Additional note, git clone --depth=1 allows us to keep the same commit SHA for our starting point, while still dropping history:

sh-3.2$ pwd
/Users/aaxthelm/Documents/pacta/pacta.data.preparation

sh-3.2$ git log --oneline -10
9e57e0d (HEAD -> main, origin/main, origin/HEAD) remove old `input` and `output` directories (#339)
1df2d07 remove `factset_manual_pacta_sector_override` (#338)
554c8b0 (origin/reduce-mem) remove `factset_industry_map_bridge` (#336)
0969f1d remove `factset_issue_code_bridge` (#335)
530af7a replace `dplyr::group_indices` with `dplyr::cur_group_id()` (#332)
0d3a49e include information about external software with `get_sessionInfo()` (#330)
6858432 fix spelling mistake :facepalm: (#329)
5f5f2f4 allow for passing a FactSet specific directory to `write_manifest()` (#328)
3865952 Ref new pacta.pkgdown.rmitemplate (#326)
e5931e8 minor formatting fix (#324)

sh-3.2$ git clone --depth=1 git@github.com:RMI-PACTA/pacta.data.preparation.git ~/Downloads/foo
Cloning into '/Users/aaxthelm/Downloads/foo'...
remote: Enumerating objects: 73, done.
remote: Counting objects: 100% (73/73), done.
remote: Compressing objects: 100% (67/67), done.
remote: Total 73 (delta 4), reused 37 (delta 3), pack-reused 0
Receiving objects: 100% (73/73), 839.76 KiB | 1.76 MiB/s, done.
Resolving deltas: 100% (4/4), done.

sh-3.2$ cd !$
cd ~/Downloads/foo

sh-3.2$ pwd
/Users/aaxthelm/Downloads/foo

sh-3.2$ git log --oneline -10
9e57e0d (grafted, HEAD -> main, origin/main, origin/HEAD) remove old `input` and `output` directories (#339)

So we could pick up where we left off and have a clear point of connection between the two repos (we might have done this last time, I can't recall)

cjyetman commented 6 months ago

To me the history (both commit history and issues/PRs) is important and very useful. I recognize that other things are also important here, but want to make sure that the importance of the history is acknowledged here too.

jdhoffa commented 6 months ago

I have added a Tech Review topic (to be filled in) here

I would suggest we discuss and decide there, so this doesn't just float around with no decision :-)

cjyetman commented 5 months ago

@AlexAxthelm I'm more interested in getting this done than saving the commit history and issues/PRs... can you help do the "hard fork" process?

AlexAxthelm commented 5 months ago

Sure. The repo is ready?

Steps I'll take:

cjyetman commented 5 months ago

great, thanks! we should also transfer all the current issues... would it be feasible to leave RMI-PACTA/archive.pacta.data.preparation public long enough to transfer issues and then make it private?

AlexAxthelm commented 5 months ago

great, thanks! we should also transfer all the current issues... would it be feasible to leave RMI-PACTA/archive.pacta.data.preparation public long enough to transfer issues and then make it private?

added to my list.

jdhoffa commented 4 months ago

In a discussion with @cjyetman we decided that this process should occur after scenario preparation is public. Depends on https://github.com/RMI-PACTA/workflow.scenario.preparation/issues/9 and https://github.com/RMI-PACTA/workflow.scenario.preparation/issues/10

cjyetman commented 4 months ago

@jdhoffa I think I'm ready to pull the trigger on this. maybe we can do it together? or?

I'd like to:

jdhoffa commented 4 months ago

Closed by the existence of this repo 🕺