econ-ark / OverARK

Project management and administration for econARK
6 stars 1 forks source link

Re-organize econ-ark information #13

Closed shaunagm closed 4 years ago

shaunagm commented 5 years ago

I'm putting pause on this until I've got a few more weeks with the project, at least, but I've already made some notes on how to improve our information organization so it's more logically laid out, more publicly accessible, and less redundant.

Key steps include:

We'll also need to update the website (see issue #27) and re-organize documentation (see issue #16).

Stuff that needs to move to separate sphinx documentation repo or to restructured HARK/documentation branch:

shaunagm commented 5 years ago

List of repositories

Repo Name Description Keep/Delete Public/Private
Hark HARK code keep public
Remark "Replications and Explorations Made using the ARK" keep public
Remark-make generates remarks content, may be able to move to source folder, see item 1 below ? private
OverArk organizational/administrative repo keep public
Quark question sets using econ-ark keep public
Quark-make generates sets & solutions, needs to be separate to hide solutions, may be able to move to source folder, see item 1 below ? private
Demark demonstrations of how to use econ-ark keep public
Demark-make I'd like to move most of this to a source folder in demark and then delete, see item 1 below delete private
ExArk Super-repo with submodules for DemARk, Remarks, Quark, Titlark - confused about the purpose of this repo, see item 2 below ? public
ExARK-make delete if deleting exark, otherwise explore if can move this to source folder in exark and delete this repo ? private
Titlark "Teaching and Instructional Tools for Learning the ARK" - clearly we should keep the content, but do we need a separate repo, see item 2 below ? public
Titlark-make generates Titlark files, recommend we move this to source folder in Titlark then delete, see item 1 below delete private
HARK-make generates HARK manual in pdf and markdown, need to implement alternate approach to HARK manual before deleting keep (for now) private
NARK notation information; we should consider incorporating this directly into documentation but leave as is for now keep public
PARK presentation materials, possibly worth keeping in version control but should at least be featured on the website too, they're easy to lose in a github repo keep public
PARK-make generates files for Park, should move to source folder in PARK, see item 1 delete private
interARK stores grant submission and similar stuff, I think this is better as folder in google drive delete private
econ-ark-make looks like this was used as a todo list by Nathan and Alex in summer 2018, I recommend emailing them the contents and deleting the repo (also this file might make a good wiki page) delete private
jobs job descriptions - there are better ways of sharing this info deleted public
scipy_proceedings fork of scipy's version of this, doubt we need it anymore deleted public
Ballpark papers that are "in the ballpark" of Ark - empty, there are better ways to capture this info than a github repo delete private
Ballpark-make see description for "ballpark" delete private
Postmark a mostly empty repo with a small text file describing plans for a journal - moved to google drive & deleted deleted private
Plutark "a list of links to completed papers that use the Econ-ARK project" - empty, and we're using a new system to track this anyway delete private
Plutark-make contains files to generate SciPy paper; I feel like this could be folded into one of the 6 other "using hark" repos - see item 2 below delete private
econ-ark.github.io simply redirects to econ-ark.org delete public

General Thoughts

1) -make repos

Generally speaking, I dislike the use of the private -make repos. If you're generating content, I recommend keeping the "generator" and the "generated" files in the same repository, just in separate top level folders. I've typically seen these called "build" and "source". In the README, you then say "to look at existing files, read the build, to change what the build looks like, edit the source".

The only reason you should need a separate repo is if there's something secret or private in the build, which doesn't seem relevant here, except possibly in the case of QUARKs, which have hidden solutions for question sets. (In an email, Chris said REMARKs have secret solutions as well? I'm not sure whether the scripts delete the solutions or merely temporarily hide them - if they're merely temporarily hidden, then we don't need a separate -make repo.)

The other reason that seems to be driving the use of these -make files is the difficulty of versioning jupyter notebooks as described in demark-make, so it's worth learning more about those constraints before making a hasty decision.

Otherwise, keeping build and source files in the same repo makes it easier to keep them in sync and produces less clutter.

2) Repos demonstrating usage of HARK

There's a bunch of repos for tracking usage of HARK. The ExARK repo uses some of them as submodules (DemARk, Remarks, Quark, Titlark) but not others (park, plutark, ballpark). It's really not clear to me how ExARK is meant to be used, though. Why 7 separate repos and a meta-repo, and not one repo with 7 folders? Couldn't nearly all of them go into DemARK as specific kinds of demonstrations?

There's also the question of what can be taken out of github altogether and put onto the website via a content management system. I think Park, Plutark, and Ballpark are the most likely candidates for this.

3) Documentation

When it comes to organizing our documentation, we may want to move the docs out of the gh-pages branch of HARK and into their own repo. I don't have a strong opinion of this yet.

Current Recommendation

  1. Move all the content of remarks/demarks/quarks/plutarks/titlark/park/ballpark into a single repo, using separate directories as needed to keep the content distinct. Delete all old repos, including interark.
  2. Move most of the -make content into a source folder in this single repo, keeping one -make version to handle any "secret" content (if any is necessary).
  3. Delete the other repos marked delete above, like jobs and scipy_proceedings.
shaunagm commented 5 years ago

Concrete Proposal for Consolidating Repos

There are currently seven (overlapping, ambiguous) types of content that we're tracking across 16 repos. They are:

I would like to propose a new set of categories:

Rather than keeping lists of papers that use the Econ-ARK project, and related papers, in a repo, we can use two different lists in the Zotero account (one for the "plutarks" and one for the "ballparks"), both of which can be linked to from the repository readme.

All of the notebooks have both "source" and "build" files. The source files are currently in separate -make directories, but they should be moved into "source" subfolders within the main repository. The only issue with doing this is the "quarks"/problem sets. I propose we retain the private quark-make repository for this and only this content.

Thus, the final structure of whatever-the-newly-named-repo-is would be:

Once the content was moved, and any links to the content from other places (such as the website) are fixed, and we've verified that mybinder/colab works with the new notebook structure and that the problem sets are being generated appropriately, we'd delete all folders except demark (or remark, or whichever repository we use as the main repository), and except quark-make.

A note on naming: if Chris is very attached to the names, we can name the demonstrations folder "demarks", the replications folder "remarks", teaching as "titlarks", and presentations as "parks", although I think a more readable name would be better. The repository as a whole could keep the name demark, as all the subtypes are in some way demonstrations, but I don't really care what we name the repo as a whole.

shaunagm commented 5 years ago

@llorracc @mnwhite @pkofod - as promised, here's a proposal for re-organizing the bulk of our content. See the comment above this one. Let me know if you have questions, concerns, suggestions, counterproposals, etc.

shaunagm commented 5 years ago

Chris notes: things besides Quarks will need to be secret, plus other instructional stuff. (Caveat - want to have papers, that's none instructional.)

pkofod commented 5 years ago
* README with, among other things, links to Zotero lists of papers that use HARK, papers related to HARK, and "wish list" papers that we'd like to see implemented in HARK and added to this repo

I think it good to use the software/services that exist (and we use) to keep track of papers for a problem of keeping track of a list of papers, so I fully agree here, versus keeping a "manually" written list in a readme to a repo.

shaunagm commented 5 years ago

Okay, I'm preparing to merge the repositories soon.

My plan is to use the DEMARKs repository as my base, because it's the main repo with open PRs, and I don't want to have to reissue those. So I'm planning to:

llorracc commented 5 years ago

@shaunagm, I've finally had a chance to read through this and think about it.

I think this is a good draft of a plan, but am not sure that it is ready for pulling the trigger. In particular, I want to process and absorb some of the ideas in the various extremely interesting presentations at PASC19 from various different research organizations -- esp the presentation on documentation that broke things down into categories like "tutorials" and "getting things done" and "retrieving information" (there was a fourth category that I can't remember at the moment but was also persuasive -- see the "issue" I posted in HARK -- which maybe should have been in OverARK).

A point that was emphasized over and over at PASC, and which I agree with entirely, is the importance of setting things up so our "meta" content is generated as automatically as possible. (The presenter strongly endorsed Sphinx for this purpose). For example, as Patrick suggested, we don't want to manually maintain a README file that links to papers; we want to generate that content. After the session I asked the assembled presenters whether they knew of ways to integrate Jupyter notebook content like what we have in our DemARKs and REMARKs with the documentation; for example, if Sphinx could present not only the docstrings, but also some index of where the object in question has been used in practice in Jupyter notebooks.

Connected to this, both the RFE presentation and the documentation presentation advocated the integration of content with Zenodo, Zotero, and the cff file format, and said that their technology for doing that was available and open source. A key seems to be to use the CFF file format because that is something that Google knows how to read and uses in its indexing of sites.

The session organizer promised to make sure that the presentations were all posted, but I don't think that has happened yet. When they are, I'd like you to scan them and then we need to have a long talk so we can process them together. Maybe we can find time to do this at SciPy.

Some further specific responses:

  1. There are two reasons to "hide" material:
    1. It needs to be kept private (like the QuARK content with answers)
    2. It is internally useful (like, "for PASC19 I want to start with Matt's presentation at SciPy last year and adapt it") but would be distracting clutter for users of the site. It's not particularly useful to them to be able to see all 15 versions of a particular presentation that we have given, but it is is useful for us to be able to retrieve that history.
      • I'm not (yet) persuaded that the value of exposing the "-make" material (basically, reduction of the number of private repos), outweighs the cost (clutter and confusion that might be caused by "too much information."). I'm not saying I couldn't be persuaded, just that it does not seem obvious to me. Probably it would be useful to look further into how QuantEcon does this with the generation of its content from rst files.
  2. One consideration suggests that we should have even fewer repos than you have proposed: Some content may be appropriate for more than one of the categories. For example, a good number of the notebooks in the DemARK are both "instructional" in the sense of being potential fodder for introductory PhD courses teaching the substance of economic concepts, and "tutorial" in the sense of teaching how to use the Econ-ARK toolkit. Similarly, some of the DemARK content is basically explicating key ideas from some REMARKs. This cross-pollination is the reason I am keen on having some kind of metadata system based on tags, so that the econ-ark.org website can find material related to a particular tag across multiple repos. That probably does not mean that we need to pursue a "one repo to rule them all" strategy but I think it DOES mean we should try to work out some system of tags that could serve our purposes before making a decision about how to structure the repos.

A final thing that I haven't figured out is whether (and if so, how) to include material that does not make direct use of Econ-ARK but is Econ-ARK adjacent. For example, I've developed a number of Jupyter notebooks for my first year PhD course, and will be developing material to present at my course in Budapest, that does not use the HARK toolkit at all. But in teaching, that material is interwoven with material that does use Econ-ARK tools. This material is now part of what is in the TITLARK but that is not a particularly good name for it. I'm still pondering this.

shaunagm commented 5 years ago

Do you want to just table this issue - and, necessarily, the issue of how Andrij will implement the website-side of notebook discovery - until we have a chance to view these presentations? One option is to email the presenters directly and ask for copies of their slides. Or were the sessions videotaped?

I have some replies to the other questions you have, but maybe it makes sense for me to wait until I read/view the presentations and understand where you're coming from before engaging.

llorracc commented 5 years ago

Yes. I've just pinged the organizer.

On Mon, Jun 17, 2019 at 11:50 AM Shauna notifications@github.com wrote:

Do you want to just table this issue - and, necessarily, the issue of how Andrij will implement the website-side of notebook discovery - until we have a chance to view these presentations? One option is to email the presenters directly and ask for copies of their slides. Or were the sessions videotaped?

I have some replies to the other questions you have, but maybe it makes sense for me to wait until I read/view the presentations and understand where you're coming from before engaging.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/econ-ark/OverARK/issues/13?email_source=notifications&email_token=AAKCK75AJYYLQRTTOG5KDYDP26XFPA5CNFSM4G3AA33KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODX3TF5I#issuecomment-502739701, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKCK74PPCAJTA5CLQM2HRTP26XFPANCNFSM4G3AA33A .

--

samdbrice commented 5 years ago

Everything I was thinking of is already mentioned above.

It's important to be mindful of your audience on GitHub - developers. A regular user may get to Econ-Ark.org, find out how to install the library, open a few notebooks, and never visit the repositories.

Generalizing the repository names is critical because it's not something you can easily change in the future when you have hundreds/thousands of fork or hardcoded values in scripts/urls etc.

The naming currently being used for the repos make more sense as "pages" or "products" on the website because TITLARK conceptually is not a "repository" but rather a "data product." Ultimately the TITLARK webpage could pull/reference "data" from econ-ark/courses or econ-ark/demos (or wherever) but you have the flexibility of changing the content and presentation of TITLARK without having to spend a lot of time on the backend wondering if a new piece of data belongs in DemARK or TITLARK. And if you ever decide to change the name from TITLARK to something else (or discontinue the data product) you don't have to "break" your repositories/urls because the generic repo names would still be representative of their data/content.

Based on the above recommendation and what's already been mentioned here's what I was proposing.

https://github.com/econ-ark/*-make -> tools https://github.com/econ-ark/DemARK -> demos https://github.com/econ-ark/TITLARK -> courses https://github.com/econ-ark/REMARK -> replications https://github.com/econ-ark/PARK -> presentations https://github.com/econ-ark/OverARK -> management https://github.com/econ-ark/ballpark -> papers https://github.com/econ-ark/QuARK -> exercises

The less the better, and the more generic the names the more flexibility you have as things change.

shaunagm commented 5 years ago

Thanks, Sam. I had thought of combining all of the "using HARK" content into one repository - what do you think of that idea?

samdbrice commented 5 years ago

In principle, I am for reducing the number of repositories.

Which ones were you thinking of specifically and what would the repo be called? (sorry if you already mentioned above)

samdbrice commented 5 years ago

I notice above you mentioned generalized names (Demonstrations, Instructional, etc), but it would help to see your mapping of (current-nicknames --> proposed-names) as that's easier for me to digest since I'm still not familiar with the nicknames.

shaunagm commented 5 years ago

I'm not fussy about the names - your proposed ones are very similar to mine - but more about the general idea of consolidation. Although I'm worried now that that's a non-starter, if having a nested directory structure will mess with mybinder displaying our notebooks.

In terms of which repos specifically, this subset (taken from your names above):

https://github.com/econ-ark/*-make -> tools https://github.com/econ-ark/DemARK -> demos https://github.com/econ-ark/TITLARK -> courses https://github.com/econ-ark/REMARK -> replications https://github.com/econ-ark/PARK -> presentations https://github.com/econ-ark/QuARK -> exercises

OverArk (renamed perhaps) would remain separate, and we'd move ballpark (and plutark, which you don't mention) out of github entirely and use an existing paper-storage tool

samdbrice commented 5 years ago

Looks good to me. Yeah, if nesting becomes an issue for mybinder we can flatten accordingly.

shaunagm commented 5 years ago

Well it's a lot of work to merge 7 repositories into 1, it would be good to know if it's going to be an issue for mybinder ahead of time. If you're up for looking into this, that'd be great, otherwise I'll check myself before implementing anything.

samdbrice commented 5 years ago

Ok, let me know what you find out - I'll also do some research.

What else remains to reach a consensus on this and get the work done?

samdbrice commented 5 years ago

Doesn't seem to have a problem with displaying notebooks in nested directories whatsoever https://hub.gke.mybinder.org/user/sbrice-setup.py-9i8djjr7/tree

Although you are still only limited to the same version of Econ-Ark for all the notebooks.

shaunagm commented 5 years ago

Great! Thank you for checking into that.

On Tue, Jul 23, 2019 at 6:58 PM Samuel Brice notifications@github.com wrote:

Doesn't seem to have a problem with displaying notebooks in nested directories whatsoever https://hub.gke.mybinder.org/user/sbrice-setup.py-9i8djjr7/tree

Although you are still only limited to the same version of Econ-Ark for all the notebooks.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/econ-ark/OverARK/issues/13?email_source=notifications&email_token=AAI75YSSGVQ7IC3NMDMP2HLQA6EIDA5CNFSM4G3AA33KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2UVJQI#issuecomment-514413761, or mute the thread https://github.com/notifications/unsubscribe-auth/AAI75YX66KPUTZ6HCKNORVDQA6EIDANCNFSM4G3AA33A .

samdbrice commented 5 years ago

Are we clear to execute on this?

shaunagm commented 5 years ago

What would you execute? You are very welcome to create a new repo that copies the existing repositories with new, clearer names and a nested structure, but please don't change anything on the existing repos as there's stuff pointing to them in our website, docs, wiki, etc that we need to change first.

samdbrice commented 5 years ago

Yes, exactly what I wanted to do. Thanks, I'm on it!

llorracc commented 5 years ago

I haven’t had time to review this whole chain, and am very busy until Fri teaching a mini-course in Budapest. So long as you don’t change existing stuff you are welcome to clone anything and experiment with alternative structures. But, as Shauna says, there are links between the repos that would need to be inventoried and addressed before we can make the switchover.

PS. The repos like ExARK and QuARK and TITLARK were created mostly for my own convenience (esp ExARK) rather than in the anticipation that anybody else would use them (except when I explicitly directed people there).

The problem of having everything in one directory structure is that some content (like solutions to problems posed on tutorial/teaching notebooks) needs to be kept permanently private, and needs to generate the content in the public-facing repo. Concretely, I create the notebook with problems and solutions, then generate the notebook without solutions but still problems, then a version without any problems or solutions.

Shauna and I discussed this but I don’t remember if we came up with a satisfactory solution.

There are a number of other considerations, and I’m not sure that it is a good idea to invest too much time in figuring out a plan until we understand what Andrij will propose for the superstructure for launching notebooks.

On 2019-07-24, at 15:38, Samuel Brice notifications@github.com wrote:

Yes, exactly what I wanted to do. Thanks, I'm on it!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.


Professor Christopher Carroll Dept of Economics Johns Hopkins University Mergenthaler 441 Baltimore, MD 21218-2685 (410)-516-7602 (o) (410)-516-7601 (Main office) mailto:ccarroll@jhu.edu http://econ.jhu.edu/people/ccarroll

Wisdom is only found in truth -- Chinese Fortune Cookie

The truth is rarely pure and never simple -- Oscar Wilde

shaunagm commented 5 years ago

@sbrice Did you over go ahead and make the alternate/renamed/consolidated repo as suggested by this comment?

shaunagm commented 5 years ago

@sbrice can you please let me know either way?

samdbrice commented 5 years ago

Hi, somehow missed the first comment. Yes, I did create the 'demos' repo that's currently linked with the latest docs in HARK but wasn't sure if y'all decided to go a different direction since DemARK etc were still heavily being used.

shaunagm commented 5 years ago

Well, I don't think any of us noticed that you created the demos repo. I guess we somehow missed it too. :)

Were you planning to do any more work on this? Our roadmap above was to merge all of the following into a single repo:

https://github.com/econ-ark/*-make 
https://github.com/econ-ark/DemARK 
https://github.com/econ-ark/TITLARK 
https://github.com/econ-ark/REMARK
https://github.com/econ-ark/PARK 
https://github.com/econ-ark/QuARK 

Looks like you've put part of DemARK in this new repo - were you planning on moving the rest over once you got a thumbs up from us?

samdbrice commented 5 years ago

Yes, I can continue the move and finish it over this weekend.

Are other related pieces (e.g. HARK --> Sphinx --> DemARK) fairly stable at this point? I haven't noticed much activity there but we'd have to reconfigure anything setup with DemaARK once everything is consolidated.

llorracc commented 5 years ago

By HARK-->Sphinx-->DemARK I think you mean that the Sphinx documentation incorporates some of the notebooks in the DemARK as examples in the readthedocs autogenerated documentation?

It's been on my to do list for quite a while to identify which notebooks should be so treated. I've kind of been waiting until it is clearer how we are going to go about serving up the notebooks in the future; Shauna and I have a call at 9am on Monday morning about that, so let's touch base maybe next Wednesday or Thu and see if the future of the notebooks has clarified.

samdbrice commented 5 years ago

Correct. In that case, I'll do whatever I can now and we can figure out next step when we touch base. Thanks!

shaunagm commented 5 years ago

Let me know if you get blocked so I can help unblock you - I don't think the question of "which notebooks are going in the documentation" should block any of this work.

On Sat, Sep 14, 2019 at 5:34 PM Samuel Brice notifications@github.com wrote:

Correct. In that case, I'll do whatever I can now and we can figure out next step when we touch base. Thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/econ-ark/OverARK/issues/13?email_source=notifications&email_token=AAI75YWPFXL2XC67WQPIEF3QJVKHDA5CNFSM4G3AA33KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6XESVQ#issuecomment-531515734, or mute the thread https://github.com/notifications/unsubscribe-auth/AAI75YUAYBCRRLQZE72NJ73QJVKHDANCNFSM4G3AA33A .

samdbrice commented 5 years ago

Just a horribly busy weekend. I'll wrap things up and push my changes tonight or tomorrow morning before we touch base.

On Monday, September 16, 2019, Shauna notifications@github.com wrote:

Let me know if you get blocked so I can help unblock you - I don't think the question of "which notebooks are going in the documentation" should block any of this work.

On Sat, Sep 14, 2019 at 5:34 PM Samuel Brice notifications@github.com wrote:

Correct. In that case, I'll do whatever I can now and we can figure out next step when we touch base. Thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/econ-ark/OverARK/issues/13?email_ source=notifications&email_token=AAI75YWPFXL2XC67WQPIEF3QJVKHDA 5CNFSM4G3AA33KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5 WW2ZLOORPWSZGOD6XESVQ#issuecomment-531515734, or mute the thread https://github.com/notifications/unsubscribe-auth/ AAI75YUAYBCRRLQZE72NJ73QJVKHDANCNFSM4G3AA33A .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/econ-ark/OverARK/issues/13?email_source=notifications&email_token=ABY734OXSHSOZU6B37QQ6O3QJ7AHJA5CNFSM4G3AA33KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6Z5YSY#issuecomment-531881035, or mute the thread https://github.com/notifications/unsubscribe-auth/ABY734JIF2QJTPDMM7SVHGDQJ7AHJANCNFSM4G3AA33A .

-- Sent from Mobile

samdbrice commented 5 years ago

I've created the codebase econ-ark/extras to consolidate the following repos:

econ-ark/DemARK is still in econ-ark/demos as I'm not sure if that should be consolidated. My intuition is to keep it separate because that's a high visibility/commit repo and keeping it as simple as possible reduces confusion and improves sustainability.

The following codebases seem to be stale or duplicates - and I've consolidates them into an econ-ark/archives repo:

The following codebases seem empty and should probably be deleted:

With the above proposed changes the GitHub org would go from 24 repos to 8. You can see an illustrated before/after attached:

untitled (2)

samdbrice commented 5 years ago

As for the private/solutions notebooks that should not be public-facing - I can create an econ-ark/private repo that contains an econ-ark/private/solutions directory but i'm not sure what exactly should be private.

shaunagm commented 5 years ago

@llorracc - when you get a chance, can you specify for @sbrice what elements of the titlark-make repository need to remain private?

llorracc commented 5 years ago

Sam,

A question: My understanding was that GitHub designates every repo as either “public” or “private.” It looks like your new organizational scheme mixes together in the “extras” repo some material that should be public (e.g., the REMARKs) and some that should be private (e.g. the QuARK) stuff). Am I mistaken — is it possible (maybe it is a superpower of an “organization” to mix private and public material in the same repo? If so, can you point me to a resource that describes how this works)?

Comments:

  1. Lots of people out there in the world have been told about the existence of the REMARK and DemARK repos. And I feel strongly that people should be able to obtain the content of these repos without obtaining a complete reinstallation of Econ-ARK. For one thing, they may want to put them at some arbitrary location on their computer where it is convenient. So my proposal is to leave these as standalone repos. I’m open to the proposition that maybe they should also be installed in some canonical location whenever the toolkit itself is installed (though Matt White disagrees). But if so, I feel that the way to do this is by making them submodules, while leaving the master version of them as standalone repos. This also would allow us to integrate them into our travis tests (which we have been talking about doing for a long time).
  2. My naming convention has been to call something, e.g., REMARK-make rather than make-REMARK on the grounds that the resource and the tools that make it will then appear right next to each other in alphabetical file listings. I’d like to keep with this naming convention as it makes it easier to realize when some material in a repo is generated by a make directory and should not be directly edited.
  3. I’m fond of my “*-ARK” naming convention, and would like to retain it as being easier to remember and more distinctive than, for example, “demos”. If there is a STRONG reason to abandon this naming scheme I will consider it, at least for some of the repos (and especially ones that pretty much nobody else has interacted with like the TITLARK).
  4. On defunct or duplicates:
    • PLUTARK is indeed something that I created but basically have not used
    • ExARK basically collects together (as submodules), all three of the repos that contain examples or illustrations of the use of the ARK, because there are some programmatic things (like the generation of student exercises) that I have done with materials in all three of these repos. If we make all three of those repos “submodules” and out them together as subdirectories in the main repo, my code for generating student exercises should work without much modification. Especially if the directory of which all of these are subdirectories is called the ExARK).
    • ballpark-make is indeed empty now, but it seems likely to me that at some point in the future we may want to generate some of the content in the ballpark using tools in a private location called ballpark-make. So let’s make sure there’s a place for it in the proposed new file structure. (The ballpark repo itself could perhaps live in the same subdirectory as the DemARK, REMARK, and QuARK content).

On Tue, Sep 17, 2019 at 9:57 AM Samuel Brice notifications@github.com wrote:

I've created the codebase econ-ark/extras https://github.com/econ-ark/extras to consolidate the following repos:

econ-ark/DemARK https://github.com/econ-ark/DemARK is still in econ-ark/demos https://github.com/econ-ark/demos as I'm not sure if that should be consolidated. My intuition is to keep it separate because that's a high visibility/commit repo and keeping it as simple as possible reduces confusion and improves sustainability.

The following codebases seem to be stale or duplicates - and I've consolidates them into an econ-ark/archives https://github.com/econ-ark/archives repo:

The following codebases seem empty and should probably be deleted:

With the above proposed changes the GitHub org would go from 24 repos to

  1. You can see an illustrated before/after attached:

[image: untitled (2)] https://user-images.githubusercontent.com/7470577/65048095-90f34e00-d931-11e9-87e5-a8ef55727a8d.png

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/econ-ark/OverARK/issues/13?email_source=notifications&email_token=AAKCK73VWGEF5XTVFHWYMQLQKDO5XA5CNFSM4G3AA33KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD64TWEI#issuecomment-532232977, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKCK765LZSPLVVPQYXOFVDQKDO5XANCNFSM4G3AA33A .

--

samdbrice commented 5 years ago

Let me start by saying I like the naming convention and believe it should remain.

My recommendations boil down to how it seems that Econ-ARK currently uses GitHub more as a website than as a software repository, which is perfectly fine. I explain further above (Jul 23rd) the drawbacks associated with that pattern but for a more concrete example, you could check out the DemARK-make/notebooks folder. The way to avoid such problems is by storing data and content separately. In the case of DemARK and QuARK it could look something like this (using three repositories):

quark-demos-demark

In the abstract, you're already doing this. The difference is you're using *-make as the level of indirection - which, implementation wise, doesn't solve your data duplication/homelessness problem. Ideally, the implementation looks more like this (using five repositories):

make-demos

(#A) ...which I don't think is ideal because you can achieve the same thing with fewer repositories.

(I didn't have any visibility into the *-make repositories back in July and so my recommendations didn't fully take those into account)

Having DemARK listed adjacent to DemARK-make only in the filesystem is somewhat an anti-pattern and you end up with a situation where the DemARK-make repository is referencing a sibling not tracked in the same version control tree. For example, this old remove_Problems..-fails.sh#line6 script in DemARK-make is referencing something it doesn't inherently know how to track (e.g. ../DemARK). I understand the reason for this is the fact that DemARK-make contains sensitive information and there's no clear way of mixing "private" and "public" repositories (pretty much the first question you asked).

There is a way of mixing "public" and "private" repositories but that only scales if you're separating data and content appropriately (and I wouldn't exactly call it a superpower 😄). As hinted in (#A) above, you can achieve this while also reducing the number of repositories. That starts by consolidating the *-make repositories into something (e.g. MakeARK) that references data (e.g. demos) and content (e.g. DemARK) as sibling submodules within the same version control tree. Within MakeARK is where you store your make scripts that (e.g.) pull from the demos submodule to generate the DemARK submodule. The make scripts should exist within the same repo because they all very much do the same thing. In the end, you're getting a double bonus by (1) reducing duplication of data, and (2) reducing duplication of scripts. Ultimately the implementation would look something like this:

MakeARK

Notice how MakeARK only pulls the pieces it needs from demos to generate DemARK. That's how you can cater DemARK (or QuARK) to different user groups (e.g. Students vs Economist) by grouping the right presentation with the right demos/exercises without duplicating the data. For example, you could have DemARK/For-Students and DemARK/For-Economist with different/evolving layouts/representations (see Jul 23rd).

That's also how you'd go about generating static content for the website.

The way you'd go about implementing this (and to finally answer your question) is by having MakeARK be a private repository only accessible by a select team (e.g. maintainers) and having all the relevant repositories within MakeARK as submodules. The submodules themselves can be mixed in terms of "public" vs "private". Whenever you run the appropriate make script (e.g. remarkify.py) it would pull from the appropriate submodules (e.g. demos and presentations) and generate the latest REMARK (sub)module. Once the updated REMARK submodule is ready to be published you'd move into the REMARK submodule then git-commit and git-push as normal.

I'd actually recommend that you go one step further and have MakeARK (and the maker scripts) be public, and only keep a solutions submodule as private (i.e. within a private repo). That way the maker scripts are open source and can be improved like any other software. When implemented as such anyone can clone MakeARK but they would not be able to clone solutions submodule even from within MakeARK (only maintainers can clone/modify the solutions submodule).

-- I believe the above sufficiently addresses your question and comments although I can go into more detail if you'd like.

-- Below is mostly semantics which is somewhat relevant to this issue but not exactly significant.

I mentioned how Econ-ARK uses the GitHub organization more like a website than a repository. If you visit the GitHub organizations github.com/QuantEcon or github.com/openjournals or github.com/shogun-toolbox or github.com/numpy you'll notice they use common nouns for repositories containing mostly static data and they use proper nouns for repositories containing software, tools, or products.

Part of the reason is that there are programming tools that reference GitHub URLs directly as an "install" target. Most often, from scanning a GitHub organization, you'll find that proper nouns are installable whereas common nouns are clonable. For example, the idea of pip install git+git://github.com/econ-ark/HARK.git is perfectly intuitive but pip install git+git://github.com/econ-ark/presentations.git is not (not to say there can't exist such a use case). For me pip install git+git://github.com/econ-ark/DemARK.git falls somewhere in between and that's why it took me some time to understand how and where data vs. software is organized within Econ-ARK.

shaunagm commented 5 years ago

@sbrice can we have a call about this? It feels stalled out. I'd like to push forward, but I don't want to just make changes and mess with your plans.

samdbrice commented 5 years ago

Yes, will email you now to setup a time.

llorracc commented 5 years ago

Incidentally, I’ve made a bit of progress on my system to take a (private) “[notebook]-Problems-And-Solutions.ipynb” notebook and automatically create “[notebook]-Problems.ipynb” and “[notebook].ipynb”. You can see an example in:

BufferStockTheory-Problems-and-Solutions https://github.com/econ-ark/TITLARK-make/Resources/notebooks/Problems-and-Solutions/BufferStockTheory-Problems-and-Solutions.ipynb

BufferStockTheory-Problems https://github.com/econ-ark/QuARK/blob/master/notebooks/BufferStockTheory-Problems.ipynb

BufferStockTheory https://github.com/econ-ark/REMARK/tree/master/REMARKs and click on BufferStockTheory/Code/Python/BufferStockTheory.ipynb

PS. The need to click in the last case is caused by the fact that BufferStockTheory is a submodule, and thus does not have a direct permanent url, another problem that we would like to find a solution to.

On Thu, Oct 24, 2019 at 9:44 AM Shauna notifications@github.com wrote:

@sbrice https://github.com/sbrice can we have a call about this? It feels stalled out. I'd like to push forward, but I don't want to just make changes and mess with your plans.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/econ-ark/OverARK/issues/13?email_source=notifications&email_token=AAKCK76X7CGGG3OYRGIFBATQQGRC7A5CNFSM4G3AA33KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECFCN5A#issuecomment-545924852, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKCK73IF77MVUERM6PEDIDQQGRC7ANCNFSM4G3AA33A .

--

shaunagm commented 4 years ago

After a lot of discussion, we've decided that making the Econ-Ark organization's repositories easily navigable is not a priority. We're going to pin the most important repos, and provide good descriptions in our documentation, but we're no longer aiming to make it easy for someone to come to the Econ-Ark org, browse our repos, and understand what's there.

(Context: Chris and Mridul believe the best way to handle remarks involves forking individual remarks to the repository, which is going to make the org so difficult to navigate that it's not worth optimizing the other folders.)

That said, we did tackle some low-hanging fruit:

Stuff that we're waiting on:

shaunagm commented 4 years ago

We're still waiting on remark-make, but I'm going to close this as done - I've made a note to finish with remark-make at the next meeting.