High level organization: why aren't we targeting a contest environment?

clemsonacm / hackpack

A collection of tools and algorthithms used by Clemson ACM

15 stars 9 forks source link

High level organization: why aren't we targeting a contest environment? #63

Closed ProtractorNinja closed 8 years ago

ProtractorNinja commented 8 years ago

Splitting off from #60:

@protractorninja said: Why are we forcing sample code to live in an example contest problem? This structure fits as a learning resource, but honestly, our goal right now should be to optimize for a contest environment. Wouldn't it be more useful to have a bare-bones DS/algorithm implementation for quick reference?

All the sample problem IO/format/etc. junk takes up a lot of space, complicates things when you're trying to find an algorithm during a contest -- it's tough enough thinking about the problem you're currently trying to solve -- and, generally speaking, requiring a sample problem makes contribution more challenging. They just aren't very useful for a contest environment.

Hence, I think that Algorithm and Data Structure sections should only require Description and Reference sections ("Applications", if actually relevant, should be a sub-section under Description), with Sample Problems as an optional extra for the HackPack++ except in cases of extreme relevance.

Here's a relevant problem: I poked around to test the claim that the Hack Pack didn't have any code for Dijkstra's Algorithm in it. The only sample code for Dijkstra's Algorithm is currently hiding in the graph section, on line 84 of a sample contest problem solution, denoted only by //dijkstra woohoo. That's not in the index (surprise!) and the Dijkstra chapter -- which is cluttered up with three example contest problems that add literally nothing in a contest environment except for confusion and cows -- doesn't contain any code or reference any code. It doesn't even explain how the algorithm works.

That's terrible.

robertu94 commented 8 years ago

Just for sake of completeness lets remind ourselves of the reasons for two versions of the Hack Pack:

The Hack Pack++ version exists as a learning resource that one could use to learn the algorithms and problem types if they had not seen or heard about them before for use in a class like CPSC 481.
The Hack Pack version exists to be a quick programming reference with all extraneous details removed for use in a contest environment.

I believe that the bulk of these problems are symptoms of improper implementation instead of bad design. For example:

All submissions are required to have sample code that solves at least one of the problems. The Dijkstra section does not.
All sections are required to have index entries for key terms. The Dijkstra section only has one term.
All sections are required to have IO code removed in the minified version. The Dijkstra section does not.
All sections are required to have a description that explains how the algorithm works. The Dijkstra section does not.
If it does not already exist, I have planned on creating an issue to move the afore mentioned code in the Graph section under Dijkstra (issue #64)

Now comes the question of how does this happen:

We have limited resources to conduct reviews and corrections, and occasionally stuff gets missed. Perhaps we could start using the style guides as a checklist before merging code to address this issue. Once the changes to the style guides are finished, I will go through and audit the sections for compliance using this method.
There previously was a solution to Dijkstra in this section, but it was removed because it was not explicitly licensed in a way that could be used in the Hack Pack. I have created issue #64 to address this.
One of the ways that we get contributions is via students in CPSC 481 writing sections. Sometimes these students only take the class for one semester, and simply didn't have time to make corrections that were needed by the time that they finished. We have attempted to address this with a contribution deadline this semester.

Here are some of the reasons why we require sample IO:

It makes it much more likely that the code was tested. In the old resources prior to the Hack Pack, there were several examples were the sample code that were just wrong. These problems would have been found in testing, but because there was no IO they were never found.
It makes it easier to see what the required input to a problem might be. In the older resources prior to the Hack Pack, there was code that used an array that started with an index of 1 instead of 0. There were no comments to this regard, but this would have been noticed if there was sample IO for the code.
It provides a lightweight means of verifying that code was entered correctly. You could use the sample input and output to help verify if a problem was typed correctly.

The major reason why we require code to be part of a sample solution is because sometimes it is difficult to tell where code like this may be used. Consider flow problems, they are used to solve maximal matchings. Without extensive experience with graphs, I doubt that you would guess this, and does not meet the Hack Pack++'s goal of being a learning resource for inexperienced team members. For people with experience who just want the algorithm, they should use the Hack Pack, not Hack Pack++.

I believe that dropping the Sample problem requirement would be a mistake.

mdclyburn commented 8 years ago

Beside being handy for seeing obscure solutions to problems, what else are the contest problems doing for us? Sure, max flow problems may be one of the more elusive ones to pick out, but is that true for most contest problems? Or are we needlessly exploding the size of the Hack Pack?

ProtractorNinja commented 8 years ago

I'm concerned that our goals are focused in entirely the wrong place. Should we really give equal importance to "learning reference" and "contest aid"?

That is, how many times has anyone actually looked to the Hack Pack for learning so far? I suspect the answer is close to zero, yet we still seem to over-prioritize the Hackpack++ aspect. It's even holding us back by requiring far too much of each submission: saying "you must contribute a lot of good stuff" earns us "a lot of probably bad stuff" instead of the more optimal "just enough great stuff" (see almost every external submission so far for examples). Condensation is a maybe-step-two afterthought, as is obvious from the proposed LaTeX guidelines.

In the case of Dijkstra, for example, we can't put all the blame on our misguided contributor. Because of our guidelines for submission, he ended up writing a ton of useless cruft instead of what Marshall and I were looking for during the ICPC: how to do Dijkstra.

That's a big problem on our side. Shouldn't we be going for "make something that's extremely useful in a contest environment" first, with additional reference as an extra?

mdclyburn commented 8 years ago

Perhaps we should provide a structure to allow submitting separate pieces? Dr. Dean likes for his 481 students to author problems, correct? Perhaps they could elect to author the description and reference code or a problem for each section in the Hack Pack? This would make it a lot easier to get content into the Hack Pack. But it could also leave us with sections that are of inconsistent quality if submitted content isn't more carefully vetted.

As far as I can see it, this is a problem with focus and priority. I like the aspect of the Hack Pack++. References are really cool and useful, but, like @ProtractorNinja wrote, it keeps the Hack Pack from gaining content in a swift manner because it requires so much bulk (the biggest of that bulk being the sample/contest problem portion).

We should prioritize the description and reference code content of a section first. These would be the more helpful things to use in a contest environment. Then we could look into adding problems...

ProtractorNinja commented 8 years ago

I stand by the complaints I have already raised regarding the Hack Pack's current direction, and I can't fully approve of project-defining documents until those issues are resolved. Anything to add, @mdclyburn and @robertu94?

robertu94 commented 8 years ago

I am content with prioritizing Hack Pack content over Hack Pack++, but I do not believe that sample problems with input and output are unnecessary for the reasons that I listed above. Yes, many problem types are more obvious than max flows. That does not mean that an implementation cannot not contain nuance that can be revealed by input/output code as well as via sample input.

I also would be fine with allowing piecemeal submission. I just hadn't broken down the issues further.

mdclyburn commented 8 years ago

I stand behind the idea of prioritizing Hack Pack content before Hack Pack++ is even considered. It might even be a more natural fit to the process. I also want to bring up the fact that we may not see much of anything come out of Dr. Dean's seminar this semester since it is not a variable credit course any longer (as of we had a high volume of high quality content before).

ProtractorNinja commented 8 years ago

Sample problems, sample data, full code, and unit tests are surely helpful in complex cases, but requiring them for every submission makes submission harder and bloats the hack pack.

Submission should be easy. Large code projects need tests and proof because the code will live on effectively forever, enduring changes and supporting the rest of the code base. Code like that, in a living environment, needs to continually prove itself. For our purposes, submissions only need to be "proven" once, during approval, because our delivery is a static document: unit tests and needlessly full-featured code samples are overkill unless context demands them. Because we have Github, all we need to solve our this-reference-code-doesn't-work problem is intelligent reviewers who actually investigate the code. "Give me a larger sample that shows this code works" is a fitting demand for submissions that need it, but currently, if I want to submit a cool Python function I found helpful at the last Mercer competition, my excitement to contribute will end up dead in the water before I even finish reading the README.

Once created and deployed, the average full I/O sample problem clogs the pack more than it helps. When I'm combing the hack pack during a competition, I'm only interested in the basics: a description of each algorithm or structure and a short, non-complex sample implementation with, if needed, an annotation about what kind of data it operates on or other noteworthy details. If needed. I don't want to see the standard includes unless I need them, I don't need to see int main(...) or sscanf(...) unless they fit the context, I don't want to have to think about how Bessie's troubles relate to mine, and I definitely don't want to see half a page taken up by garbage like "Sample Input: 4"!

The ideal algorithm or data structure sample doesn't need any of this extra stuff because the ideal snippet is short, sweet, and parsable; inevitable exceptions notwithstanding. We should be encouraging high-quality content like that, not average-quality content that simply ticks all the boxes. Heavy-handed checklist-style submission approval tactics will distract us (and future contributors/organizers) from what's important, like making sure that people actually find the hack pack useful.

The hack pack should be useful. It really isn't. I'm appalled at how useless the hack pack was for me during the ICPC last year. It's an overly formal mess of nigh-useless content with a few flecks of gold hidden inside. Something is very wrong if anyone ever gets close to thinking that.

The more I consider the idea, the less I care for even having the Hack Pack++ in addition to the standard Hack Pack. We're not doing one thing well; we're doing two or three things poorly. Perhaps it would be more prudent to focus on creating a streamlined, damn useful, practical document and expand it later only if interest is raised?

I think that our current helpless state demonstrates that we bit off more than we could chew. We could really benefit by reducing the scope of this project and re-evaluating what the team really needs. We don't need a formal process for anything until then.

robertu94 commented 8 years ago

Even if we do not include the full implementations of the solutions in the Hack Pack itself, I am not convinced that we can ensure functional code without them. These "ah! the reviewers will save us" methods fail us all the time because people are failing creatures.

If you can prove exhaustively that code works without test cases and without actually compiling and testing it, then I would be happy to drop the requirement. However, time and time again we see not that careful review is what ultimately generates working code, but through testing. The last thing that we want to do in a contest is not reading a few extra lines of context and usage but to have to debug code from our cheatsheet that flat out doesn't work.

As for your python snippets, if they are sufficiently short, they probably don't belong in the algorithms sections or the data structures sections where this kind of testing is essential. Rather, they should exist in the examples section like where Makefiles are stored.

I also don't mean to discount your experience that you didn't find the document extremely useful, but if you are using the HackPack++ during a contest, you are going to be miserable. Especially, if you are a experienced veteran like you are. There is a ton of stuff that you don't need. Beyond using needing HackPack instead of HackPack++, when you had your problems: a majority of the document had not be marked to be minimized, and almost none of the sections are/were indexed. That is the tragedy of a work in progress, not a flawed design.

I am happy to put a greater focus on writing the HackPack than the HackPack++, but not at the expense of through testing.

ProtractorNinja commented 8 years ago

I was using the regular hack pack. It's bad. A solid design would not have enabled its creation.

I disagree with, well, just about everything you've said. I don't think you've effectively discounted any of the points I've already raised: again, submission approval is a process that only needs to be done once. Submitted code should, generally, be happily generic and widely applicable. It shouldn't be the sort of thing that needs unit tests to prove correctness.

Eliminating Dr. Dean's credit boost means that we're only going to be getting submissions from people who are interested in making the hack pack better. We don't need unit testing to tone down any kind of content influx. We need good code that works. Unit testing is one way to get it, but requiring it now is overkill for the reasons I've already stated. Critical code should be elegant enough to innately demonstrate its correctness.

People are failing creatures, yes, which is why it's nice that multiple people can review something more than once. They might mess up, but we don't need to plan for that quite yet: we haven't yet seen that reviewers are going to have trouble if they spend the time to really get involved with the code. Submissions that were already bad aren't going to get much better if they get smushed into unit tests; ironing out little bits of incorrectness won't help if the thing just sucks overall.

I think that this sweet short testing-not-needed code ideal is the biggest difference in our thought processes. Algorithm and data structure base definitions and samples should't need to be unit tested, but we could say that full, input-driven samples that exist for certain complex processes do.

mdclyburn commented 8 years ago

Well. I didn't have any strong opinion on the policy of unit testing, but @ProtractorNinja brings up a good point. Do we even need unit testing? Unit testing is cool, I'll admit. I use it on all of the code that I care about. But its primary use is to ensure that a growing/shrinking/morphing code base still works in the way the author(s) intend it to. Can part of the review process for submission be the reviewer testing the code? If we can do this, we can cut down on a lot of bloat. And if one person isn't enough to review a submission, add another reviewer. I'm sure we can easily follow the rule of not merging in content if it is erroneous.

Make a case for unit testing. Once a submission makes it into the Hack Pack, it should be correct. Unless there is a good reason for unit testing, I want it thrown out.

robertu94 commented 8 years ago

If I haven't convinced you both yet -- that a set of complete tests is necessary ensuring for functional code -- I doubt that I will. Go ahead and remove the requirement and make it a suggestion if that is what you both think is right. I will continue writing tests for the code that I write, and that will be then end of this thread.

ProtractorNinja commented 8 years ago

We still have much to consider, so this issue hasn't died quite yet. Tomorrow (Friday the 29th) I'll write more on what route I think we should take.

ProtractorNinja commented 8 years ago

I've made many critical remarks regarding the Hack Pack's current construction. My goal in this comment is to suggest a number of ideas and major refactoring steps that will dramatically improve and clarify the project.

First, here's an outline of how we should go about doing this:

Figure out what we want for the hackpack, for contributors, and for maintainers.
Gut the Hack Pack. Start it from scratch with the new plans in mind.
Craft an optimized development/submission pipeline (add more jargon as needed).
Bring in the best of the old Hack Pack as per (3).
Add more until we can beat UCF every time.

This comment addresses mostly (1), but I feel that I should note why I think (2) is necessary. Because I think there's only a small amount of kickass content in the current hack pack, it would be more effective to redesign the Hack Pack from the ground up instead of trimming it down from what we have now. Doing so in haste would be just as catastrophic as our current situation (because we've been working towards deadlines for much of the time), so let's be thorough on this.

On to what I think we need. I'll provide use cases from two perspectives: users and submitters. First, our ideal user:

User is at a contest, and has at least a vague idea of what kind of data structure or algorithm they need, e.g. (1) a graph traversal alorithm of some kind, (2) a structure with fast insertions and automatic sorting, or (3) a quick way to generate a list of primes in Python. The user looks to the index ("graph", "O(1) insertion", "primes", etc.) or the Table of Contents ("Algorithms.Graphs", "Structures.Heap", etc.) and narrows down the sections they're looking for quickly, because each is short and easy to analyze. The use case stops here, because once the user has found the correct section, they have exactly what they need to craft a solution to their problem.

Key points of this case:

User is at a contest. The hack pack might be useful as a study aid as a consequence of being awesome, but the Hack Pack should only contain something if it's useful in a contest environment.
User has a vague idea of what they need. Learning how to connect a problem type to a solution strategy isn't something that the hack pack needs to address, because it's a reference rather than a not a textbook.
Finding things is intuitive. Hierarchies are hard to perfect, and there are multiple ways to reference things. How to structure the hack pack and optimize finding things easily is a discussion worthy of its own issue.
It's as easy as that. Users should have no trouble at all in quickly understanding how the algorithm works and how to use it in code.
User is competent. Because we can safely assume that this hack pack is for good coders, we can keep the document free from trivial, useless information. This is a very subjective line.

And our ideal contributor:

User wants to contribute code. User opens or finds an issue for discussing whether or not their idea would be useful, what should be added, and where it should go. If yes, user clones the repo, installs needed requirements, and adds content. Maintainers work with the user to ensure high-quality submissions. Once deemed complete, a maintainer merges new content into the hack pack.

Key points:

Sections are planned upon. The contributor and maintainer(s) should first agree on what implementation of the user's ideas would be most useful for the hack pack. This will reduce cruft.
All contributors are actively involved. Maintainers and users both need to be present and provide feedback to help create the best document possible.
Unit tests are exchanged for extensive review. Tests are not required, but may be part of the review process if code is extensive or notably complex.

Our Hack Pack should be organized to optimize these use cases, with the contest attendee as the overriding priority. Here are some ideas I think we should start with:

Drop the idea of the Hackpack++. We should not try to teach.
Do not include contest problems in the hack pack. Examples of algorithms, data structures, and strategies should be short and generic.
Emphasize code. There shouldn't be lot of supplementary text; important details like algorithmic complexity should be prominent and similarly styled (e.g. as a large subtitle heading).

My ideal Hack Pack section looks something like this:

/Algorithms/Graphs/Dijkstra's Algorithm Shortest path between edge-weighted nodes in O(N log N) time Works with directed and undirected graphs

see /Structures/Graph for graph class

< minimum amount of code to implement Dijkstra follows; does not need int main(), etc. > < comments annotate changes needed for variants>

It's short, useful, and only includes exactly what a reader would want.

Here also is a mockup of an idea for overall layout:

Core
- File I/O
- Makefile/Running
- vimrc
- ..
Structures
- Graph
- Heap
- Segment Tree
- ...
Algorithms
- Graphs
- Strings
- Geometry
- Primes
- ...
Approaches (these should just be simple examples, like everything else)
- DP
- ..

Thoughts?

mdclyburn commented 8 years ago

This plan sounds very radical, but I think it will actually solve a lot of our issues with the current system we have.

Perhaps my comment about unit testing was a bit short on reasoning and incendiary in nature. I would like to reiterate that the idea of 'unit testing' submissions is useless once the content is in the Hack Pack. We want others to contribute. Even if their implementation is not entirely correct, it's the job of the maintainers and reviewers to help others' content mature enough to make for a valid submission. As I said before, the purpose of unit tests are to ensure that regressions don't happen when changes are introduced. Each submission to the Hack Pack should be completely modular and isolated from everything else. Even if I added an erroneous implementation of a splay tree, the code for Dijkstra should not suddenly break. Making contributors write unit tests for their own code is a bit silly as well. It is the job of the maintainer to properly vet submissions. Of course, ideally, we would like everyone to be reasonably confident in the code they send our way, but we shouldn't have to go as far as to wanting to unit test all the things.

So, with your idea, we would be looking at a much more intensive review of the code and content that is submitted for inclusion in the Hack Pack. This is pretty much a given and should largely replace the unit testing methods we have in place currently. Maintainers and contributors should work closely together to ensure that submissions are valid before they make it into the document.

I would be a little sad to see the didactic aim of the Hack Pack go out of the window (whoever contributed that KMP section sure put in a lot of work...), but the decision does seem to come out of common sense. Were we mistaken to even try to target everybody with a single document? I think so. Throwing out the heavier Hack Pack++ will also make our build system better, yes?

I still like the idea of the 'Core' section being towards the end of the Hack Pack. I realize that this section is probably not going to be all that large, but it also isn't hugely important (not after the first five minutes of the contest, anyway).

/Algorithms/Graphs/Dijkstra's Algorithm

Please tell me that is not a path... please...

This seems like a very viable way to handle compiling the Hack Pack for the future. I don't have any specific concerns, but organization is key.

ProtractorNinja commented 8 years ago

Luckily, no, that's not a filepath--just a path through the tree that matches my Table of Contents.

ProtractorNinja commented 8 years ago

Our build system does work at the moment, but removing the Hack Pack++ would really slim things down. The process doesn't have any major external dependencies that we'd be eliminating.

robertu94 commented 8 years ago

I agree that the plan sounds very radical. I don't quite see why we need to dump and reload all of the content. I also think that keeping history of the document is important even just for remembering why we wanted to go another way. Between git rm and fairly drastic cuts removing the "glut" we should be able to reorganize without dumping everything.

I am fine with the new organizational structure (chapter organization) you propose. I don't see that much different from the current layout except renaming appendix to be core and moving it to the front. Perhaps you could explain what I am missing?

I do agree that our index needs to be more useful. I would like to see some more concrete guidelines for indexing in your proposal.

I agree that the user should be competent, but I do think we need more guidelines here. For example, there is a big difference between say Brendan competent and generic programmer competent. I would like to see some concrete details spelled out.

I would try to favor robust examples sets over the strictly most concise set where this makes since. For example, consider some of the graph algorithms. Some of the variants can get fairly gnarly if you have never written them before. So for example we may have a general DFS, and the DFS variant that topologically sorts.

You say, "Unit tests are exchanged for extensive review", but also "tests are not required." could you please clarify?

In your example section if we are dropping IO code, could I convince you to include a strong suggestion for docstring-like comments for each function as a code style feature.

Also if we are dropping IO code, I would like the code to be written as a part of a valid function. We wouldn't necessarily need a main/IO, but at least a function with arguments would be helpful. Additionally for languages like python with out strict typing, a list of expected types should be specified either by name or by docstring.

I love Python, but I would emphasizing only use it where it makes sense. For many problems, the algorithms turn out being smaller in C++ because of STL's more robust data structures section. Additionally, most of our teams know C/C++ as their primary language. It may however be useful for string parsing or big integer problems. As such I recommend at least making a suggestion to focus on C++ over Python.

I too will miss the Hackpack's didactic purpose, but I will follow the consensus.

If we are removing HackPack++, I would also remove the find+awk magic that we use for pre-processor macros. It will simplify the build process, and removes unneeded tools, and will greatly improve readability of the make file.

ProtractorNinja commented 8 years ago

@mdclyburn:

I still like the idea of the 'Core' section being towards the end of the Hack Pack. I realize that this section is probably not going to be all that large, but it also isn't hugely important (not after the first five minutes of the contest, anyway).

I put it at the front precisely because you'd want that stuff during the first five minutes (so it's conceptually 'first'). Having it at the back would mean that it stays out of the way, though; I guess it could go either way.

@robertu94:

I don't quite see why we need to dump and reload all of the content ... I also think that keeping history of the document is important even just for remembering why we wanted to go another way. Between git rm and fairly drastic cuts removing the "glut" we should be able to reorganize without dumping everything.

Starting from scratch (that is, keeping the LaTeX but reducing it to a content-less skeleton) would be much cleaner and wouldn't add that much more work (at least on the elimination scale I'm thinking of). Why would we need any more history than what is contained in GitHub's Issues?

I am fine with the new organizational structure (chapter organization) you propose. I don't see that much different from the current layout except renaming appendix to be core and moving it to the front. Perhaps you could explain what I am missing?

I included that as a bit of an afterthought -- a starting point for more discussion. I think we'll need a new Issue for planning the new document's hierarchy.

The main difference at the moment is that the Algorithms section is organized by topic.

I do agree that our index needs to be more useful. I would like to see some more concrete guidelines for indexing in your proposal.

I don't have anything concrete for that yet, aside from the observation that the current index isn't helpful. That's a topic for a new Issue, once we resolve high-level plans and problems.

I agree that the user should be competent, but I do think we need more guidelines here. For example, there is a big difference between say Brendan competent and generic programmer competent. I would like to see some concrete details spelled out.

I would try to favor robust examples sets over the strictly most concise set where this makes since. For example, consider some of the graph algorithms. Some of the variants can get fairly gnarly if you have never written them before. So for example we may have a general DFS, and the DFS variant that topologically sorts.

I'm pretty sure that that example falls under what I meant, although I agree it wasn't 100% obvious from what I wrote. Nothing needs to be absolute; we just need to encourage an optimal balance between conciseness and clarity.

I'm guessing you got the idea of "most concise" from my "minimum amount of code to implement...". I should have been more clear; I meant "minimum" to go along with "no need for unrelated supplements". "Absolutely as little code as possible" is not a good thing to strive for.

You say, "Unit tests are exchanged for extensive review", but also "tests are not required." could you please clarify?

Better wording would have been 'unit testing is exchanged for extensive manual review'. Those two quotes mean the same thing.

In your example section if we are dropping IO code, could I convince you to include a strong suggestion for docstring-like comments for each function as a code style feature.

Also if we are dropping IO code, I would like the code to be written as a part of a valid function. We wouldn't necessarily need a main/IO, but at least a function with arguments would be helpful. Additionally for languages like python with out strict typing, a list of expected types should be specified either by name or by docstring.

I was looking through the Hack Pack again, and realized that most of the code that exists isn't as troublesome as I previously thought. Many of the computational geometry examples are already solid if a smidgen messy (my favorites are just includes and a well-written function); quite a few of the contest examples have decent but unclear code that'd be a lot better if they weren't tied to a specific example problem.

I don't think we need a concrete specification for what exactly a code sample needs, except that it should be useful and modular. Sometimes this could be just a function; sometimes a full program; sometimes a few functions and a class. I think we should encourage understandable code (which may have comments) instead of trying to shoehorn in mandatory documentation.

Many algorithms involve building their base structures first. That's an important thing to include somewhere, but one that I think contributes to clutter. One solution to this would be standardizing the structures that each algorithm may use and defining them (and how to fill them up / build them) in the Structures section, then working with an already-complete version in the Algorithms section.

Come to think of it, it might even be sensible to combine the two sections into Data Structures and Algorithms. The Graph section, for instance, could begin with graph classes, then go on to specific algorithms that involve those classes. I'm not sure if merging them in this way is entirely advantageous, though; I'd like to hear everyone's opinions on organization once we get a new issue going for it.

I love Python, but I would emphasizing only use it where it makes sense. For many problems, the algorithms turn out being smaller in C++ because of STL's more robust data structures section. Additionally, most of our teams know C/C++ as their primary language. It may however be useful for string parsing or big integer problems. As such I recommend at least making a suggestion to focus on C++ over Python.

I haven't mentioned anything about suddenly favoring Python over C++.

If we are removing HackPack++, I would also remove the find+awk magic that we use for pre-processor macros. It will simplify the build process, and removes unneeded tools, and will greatly improve readability of the make file.

Yes, there's no reason to keep anything we don't need.

robertu94 commented 8 years ago

@robertu94:

I don't quite see why we need to dump and reload all of the content ... I also think that keeping history of the document is important even just for remembering why we wanted to go another way. Between git rm and fairly drastic cuts removing the "glut" we should be able to reorganize without dumping everything.

Starting from scratch (that is, keeping the LaTeX but reducing it to a content-less skeleton) would be much cleaner and wouldn't add that much more work ...

After hearing your explanation, I think that we have the same idea.

I am fine with the new organizational structure (chapter organization) you propose. I don't see that much different from the current layout except renaming appendix to be core and moving it to the front. Perhaps you could explain what I am missing? ... I think we'll need a new Issue for planning the new document's hierarchy ...

Agreed

I do agree that our index needs to be more useful. I would like to see some more concrete guidelines for indexing in your proposal.

I don't have anything concrete for that yet, aside from the observation that the current index isn't helpful. That's a topic for a new Issue, once we resolve high-level plans and problems.

Agreed

I agree that the user should be competent, but I do think we need more guidelines here. For example, there is a big difference between say Brendan competent and generic programmer competent. I would like to see some concrete details spelled out.

...I'm guessing you got the idea of "most concise" from my "minimum amount of code to implement...". I should have been more clear; I meant "minimum" to go along with "no need for unrelated supplements". "Absolutely as little code as possible" is not a good thing to strive for.

Yes that is what I was getting at.

You say, "Unit tests are exchanged for extensive review", but also "tests are not required." could you please clarify?

Better wording would have been 'unit testing is exchanged for extensive manual review'. Those two quotes mean the same thing.

Thanks for the clarification.

Just a philosophical point: suppose as the reviewer, the way that I review the code is by writing and executing unit tests. Should I commit my tests to the repository?

In your example section if we are dropping IO code, could I convince you to include a strong suggestion for docstring-like comments for each function as a code style feature.

Also if we are dropping IO code, I would like the code to be written as a part of a valid function. We wouldn't necessarily need a main/IO, but at least a function with arguments would be helpful. Additionally for languages like python with out strict typing, a list of expected types should be specified either by name or by docstring.

I don't think we need a concrete specification for what exactly a code sample needs, except that it should be useful and modular. Sometimes this could be just a function; sometimes a full program; sometimes a few functions and a class. I think we should encourage understandable code (which may have comments) instead of trying to shoehorn in mandatory documentation.

I guess that doc-strings aren't as important for C/C++, but for weakly/duck typed code, I think they are essential for understanding how the code works.

To be concrete that is what I had in mind using prefix sum as an example for both C++ and Python

C++:

int prefix_sum(const vector<int> & v, const unsigned int index)
{
    int sum=v[0];
    for(int i = 1; i <= index && index < v.size() ; i++) tmp+=v[i];
    return sum;
}

Python:

def prefix_sum(v,index):
    """
    int[] v: list of numbers to sum
    int index: a non-negative integer no greater than len(v)
    """
    return sum(v[0:index+1])

One solution to this would be standardizing the structures that each algorithm may use and defining them (and how to fill them up / build them) in the Structures section, then working with an already-complete version in the Algorithms section.

Agreed.

ProtractorNinja commented 8 years ago

Just a philosophical point: suppose as the reviewer, the way that I review the code is by writing and executing unit tests. Should I commit my tests to the repository?

No.

I guess that doc-strings aren't as important for C/C++, but for weakly/duck typed code, I think they are essential for understanding how the code works.

To be concrete that is what I had in mind using prefix sum as an example for both C++ and Python

That depends on the kind of code we want people to write--or what kind of code a particular situation calls for.

On one hand, you could fix your python example by renaming 'v' to 'vector' or something similar. This particular function is simple enough that it doesn't really need any documentation, and it's the sort of function that's probably not going to be changed later by the programmer. Using 'vector' or another verbose variable means that it'll take a little longer to type, but not enough to outweigh the benefits of being intuitive.

On the other hand, code that will probably need to be updated or changed or is generally confusing may warrant better naming, documentation notes, or other tricks on a per-case basis.

I think I'd rather see code with longer names (i.e. 'self-documenting code') and less documentation, because it's somewhat less likely that someone, after copying code, will have to refer back to the hack pack later because they forgot or didn't record what their updated code's one-character variables mean.

robertu94 commented 8 years ago

Just a philosophical point: suppose as the reviewer, the way that I review the code is by writing and executing unit tests. Should I commit my tests to the repository?

No.

Why not? They would be useful to the developer in the cases that they don't pass. The also could provide insight as to what something was tested against. Could I get your thoughts here?

ProtractorNinja commented 8 years ago

Why not? They would be useful to the developer in the cases that they don't pass. The also could provide insight as to what something was tested against. Could I get your thoughts here?

Hmm... perhaps I was still reacting to the idea of enforcing unit tests everywhere. I think it would be fine to allow storing unit tests in the repository, as long as they're useful and we don't require all code to be submitted in a manner that is testable.

mdclyburn commented 8 years ago

Just a philosophical point: suppose as the reviewer, the way that I review the code is by writing and executing unit tests. Should I commit my tests to the repository?

I do not really have a problem with this, but we then end up with this weird inconsistency where some code has unit tests and some does not. I am not against saving unit tests, but why would we need them in the future if we only merge code that is known to work and should not change in the near future?

ProtractorNinja commented 8 years ago

@mdclyburn: That's something I was wondering about.

I'm leaning towards saying "no" to tracking extraneous tests and stuff except in very special cases, as a way of keeping things clean.

robertu94 commented 8 years ago

I don't see how tracking tests can really hurt us, and in the case that we find a problem with code that has been committed, we can know what a program has been tested against. I just don't see them as extraneous. Additionally, if your concern is consistency, we are going to have differing levels of detail and files related to a topic with your match the level of detail to the complexity of the problem.

ProtractorNinja commented 8 years ago

The only time we'd be revisiting code that we saved unit tests for would be when those unit tests failed to verify that the code works.

Unit tests, if used for reviewing, can just as well be inserted into a code block in the pull request for the code in question.

Tests should stay out of the repo.

robertu94 commented 8 years ago

@ProtractorNinja Do you even hear what you are saying!?

You are suggesting tracking code via the GitHub pull request comment system. Do you realize how ridiculous that is? Emailing the unit-tests back and forth would be better than that because at least that is distributed. You also essentially lock us in to GitHub permanently as our source code repo provider because our project history will be in a format that we cannot easily extract.

Which situation would you rather be in, a case where you have some idea of what has been tried before or one where you have none?

If you both really don't want to commit tests, then I won't do it, but for the love of sanity, don't use GitHub comments as your means of distributing any type of code.

mdclyburn commented 8 years ago

My problem with the whole unit testing system we used before is that we tested the sample problems that had the algorithm/data structure implemented. This did not apply to any sort of reference code that we could have written (what reference code did we have that was unit tested?)... I actually disliked the whole idea of extracting a solution from a sample problem's solution. I understand wanting to reuse code for testing things in the future should it be necessary, but unit testing is too powerful of a solution. It adds a lot of bulk to submissions. We're trying to make it easier to add content, and I think being a responsible reviewer should necessitate testing a sufficient, effective set of inputs.

It's a difficult topic for me to debate. I see the merits of both ideas, but I think our small group size makes unit testing a hefty, unnecessary solution.

mdclyburn commented 8 years ago

We have made some headway on this issue, but I am very disappointed by the overall lack of progress we have made on this issue. Continuing like this is a futile matter. I am closing this issue to further discussion. I'm going to take some time and deliberate and will get back to you both with a plan of action to resolve the future of the Hack Pack.