chaoss / augur

Python library and web service for Open Source Software Health and Sustainability metrics & data collection. You can find our documentation and new contributor information easily here: https://oss-augur.readthedocs.io/en/main/ and learn more about Augur at our website https://augurlabs.io
https://oss-augur.readthedocs.io/en/main/
MIT License
587 stars 845 forks source link

Use of Django #22

Closed abuhman closed 7 years ago

abuhman commented 7 years ago

Earlier, Derek and I discussed using Django vs Flask for our web portion, and we had initially decided on Flask. However, Matt and I discussed today that Django may allow for a known organization of what goes where in the code, which may help others to understand our project. This could lead to easier contributions from others if they also understand the Django framework.

I had installed Django for some of the initial work I did on learning to connect to the API, and I found it was quite easy to use. Here is a tutorial about making a Django app: https://docs.djangoproject.com/en/1.10/intro/tutorial01/. One thing I did not follow from the tutorial was setting it up to work with a database (I imported a separate driver), so if we were to follow the framework we'd probably need to do it that way instead. I am about to head to the lab meeting, but I have a views.py I will also post later to show an example of using Django (though I admit I didn't organize it correctly to be Django-like).

sgoggins commented 7 years ago

@abuhman : I have not used Django before, but my experience with other frameworks is that database connections are often handled better by something that appears outside the framework. Lets keep exploring.

Framework maturity wise, I think Django is a more solid foundation to build on, particularly given the speed we want to work at. .. Let's setup a prototype system and do some example apps for health measures in Django over the next few days and report back to the rest of the team. :) Thanks for pointing this out!!

howderek commented 7 years ago

I disagree we should use Django. Django's advantage is that it has a built in ORM plus a templating language. We need neither of those because we are building a REST API that only takes GET requests, and a static page that knows how to interact with that API.

Since we are using someone else's schema for most of our data, Flask provides the functionality we need. Flask is better for applications that require more custom functionality on the backend. In other words, Django is more opinionated than Flask. Our project is not one that would fit well into that mold.

Our backend will never make changes to the GHTorrent database based on user input. We will never need authorization. We don't want to render anything on the backend. The advantages Django comes with are overkill for this project, because the bulk of our code will be collecting metrics in a way we can manipulate easily. Dressing those metrics up with a REST API and shoving them into JSON are trivial in comparison. Flask is far more minimalist than Django, and that's what we want.

As far as Django being more widespread, that's true, but Flask is also a well known framework. Pinterest, Twilio, LinkedIn, Uber, and Mailgun are some examples of well-known compaines that use Flask.

I look forward to viewing the Django approach, and if it really does seem better I'll be more open to the suggestion, but right now I don't feel like the advantages Django has make sense for our project.

abuhman commented 7 years ago

Here it is:

https://github.com/OSSHealth/ghdata/blob/dev/views.py

This is the views.py file of the local Django framework I had set up.

howderek commented 7 years ago

@abuhman I would strongly recommend using the ORM to prevent SQL injection and using Django's templating language to render those layouts.

abuhman commented 7 years ago

To my knowledge, SQL injection can not happen if there is no way for users to provide input (as is the case in this views.py). This views.py was also intended to only ever be run locally by me (it was a learning experiment), never on a real web server. However, sql injection is definitely important once we start using something like the Github URL to construct queries and having code that runs on a real web server.

As far as Django's templating language, I can look into that. It also depends on what outside contributors are most comfortable with. If the templating language sees widespread use and understanding, it may be a good thing to use.

I also think if this was shared code it should have had more comments explaining some of the logic. After having not looked at this in a while, I can see that some of the logic is not self-explanatory. If this was for main repository code, it could also probably use a bit of cleaning for the same reason.

howderek commented 7 years ago

Users cannot provide input, but external input is being provided by the GitHub API (response -> eventData). External input should never be trusted. GitHub is probably not going to create an event type called '); DROP ALL TABLES; but it's still important not to inject external data into a query directly.

Templating (whether Django's templating language, or others) is definitely a good thing. Your business logic should be completely separate from the logic to display it.

abuhman commented 7 years ago

I see what you mean. I had forgotten about the GitHub API input. Thanks for pointing that out.

I'm new to Django, but I believe that it normally does separate business logic from display logic (the general useage of views.py is unlikely to be intended to contain business logic), or at least that most frameworks do. However, I didn't attempt to follow that for a local experiment. It was easier for me as far as development speed for something this small to combine all logic in a single file.

howderek commented 7 years ago

@abuhman I totally understand. I believe however when you do create a Django project following best practices you'll see why Flask is better for our project. Django isn't very Pythonic nor is it easy. Django is easier for building apps that are what most people think of when they say "webapp" (user authentication, interactions, control panels, etc.) but for our application it just doesn't make sense. It adds a whole layer of complexity that doesn't need to be there.

abuhman commented 7 years ago

While looking for information on test frameworks, I found a page with a list of links to resources on Django. Posting it here for relevance to the conversation, though I haven't examined any of the books or tutorials: https://www.fullstackpython.com/django.html.

As far as Flask vs Django, I think Matt is finding that Django is more widely used or well known at the conference he is at, which is another consideration. I admit I'm not very experienced with either one.

howderek commented 7 years ago

There's no doubt that Django is more widely used than Flask. If we are trying to expose our project to as many people as possible, I'd argue we shouldn't be using Python, we should be using JavaScript. Python is more widely used in scientific communities however, and I think GHData's current architecture is geared to the scientific community.

ghdata.py currently returns Pandas DataFrames, which will be useful for Python developers to integrate into their existing analyses. server.py takes those DataFrames and serializes them into JSON and spits them out (using Flask decorators) for both our web app and those who want to consume the data in their own dashboards. If our sole goal is to create a web application that monitors the health of GitHub repos, we should use JavaScript (probably Express.js) instead of Python. JavaScript has a larger open source community and more packages than any other language. It's also more intuitive to make an API with Express than Django because JavaScript speaks JSON natively.

If the main thing we care about is creating a open source project that others can contribute to, I still don't think Django is the best choice

howderek commented 7 years ago

Let me explain how I'm seeing this.

I made architectural decisions thinking about 4 audiences:

  1. Researchers who just want the raw metrics to play around with.
  2. Companies that want to use the data in their own tools.
  3. Projects that want to integrate GHData stats into their websites.
  4. Individuals that want to look at the health of a given repo.

I'm working on four deliverables for those audiences:

  1. A wrapper for the raw data for the researchers (ghdata.py)
  2. The API for companies that want to integrate it into their own frontends (server.py)
  3. The JavaScript library for projects that want to integrate it on the frontend (ghdata-api-client.js)
  4. The dashboard for users (frontend/index.html + associated styles and scripts)

Now it feels like all we care about is the dashboard and all the other work I've done isn't valued. I'm okay with that if there are reasons my architecture is poorly designed, but it comes across like my work isn't valued because it doesn't use what other projects are using. Now the whole reason Django is popular is because it designs the architecture for you. I strongly feel the architecture it prescribes is not right for us, but I've heard no argument as to why it is, only that it's easier for outsiders to contribute to and more projects use it.

I agree this project should be easier to contribute to. It needs better documentation and it needs tests. That being said, I'm working my butt off trying to get as many metrics done as possible before each meeting, and I don't have time to do those things if I'm focusing on metrics. We don't even know which data are important yet, but we are approaching this project from the stance of "how can we invite more people in?"

That's important, but is it the most important thing right now?

Django would not help us be more productive. It would slow us way down, and the most unattractive thing to outsiders certainly isn't Flask, it's a project that doesn't even do anything. Switching architectures would mean starting over.

Also, I'm not even sure Django would be easier to contribute to. Flask is as simple as:

from flask import Flask, Response, jsonify

app = Flask(__name__)

@app.route('/your/endpoint/here/<blah>/')
def funFunction(blah):
    message = {"message": blah}
    return jsonify(message)

if __name__ == '__main__':
    app.run()
sgoggins commented 7 years ago

@howderek & @abuhman : Here's the thing: We're working together and trying to share the project with a wide range of consuming projects on GitHub. Let's have a call tomorrow. You are both highly valued!

germonprez commented 7 years ago

@howderek and @abuhman -- yes, both valued.

@howderek -- reading your last post several times, I'm seeing your point of concern and it would easier to talk through this via a call (I had a long response written and it was too long).

I'm planning on being in the Reno airport tomorrow at about 12 Central. Would that work?

howderek commented 7 years ago

@germonprez Yes, I am available tomorrow anytime after 10:30am! I will email you my cell phone number.

abuhman commented 7 years ago

This is a late response, but I'm available.