cobalt-uoft / uoft-scrapers

Public web scraping scripts for the University of Toronto.
https://pypi.python.org/pypi/uoftscrapers
MIT License
48 stars 14 forks source link

Repo structure #39

Closed munrocape closed 8 years ago

munrocape commented 8 years ago

Why are the scrapers themselves in the folders __init__ files? I thought init files were just for proper import / folder structure.

qasim commented 8 years ago

After reading up on modules/packages a little, you are correct!

I'm going to read more into it after my exam tomorrow and can propose a new directory/file structure that makes more sense. If you have any ideas, throw them around 😁

qasim commented 8 years ago

So I see 2 directory structures that can be implemented.

With __main__.py instead:

uoftscrapers/
├── __init__.py
└── scrapers
    ├── buildings
    │   └── __main__.py
    ├── calendar
    │   └── utsg
    │       └── __main__.py
    ├── coursefinder
    │   └── __main__.py
    ├── exams
    │   ├── utm
    │   │   └── __main__.py
    │   ├── utsc
    │   │   └── __main__.py
    │   └── utsg
    │       └── __main__.py
    ├── food
    │   └── __main__.py
    ├── scraper
    │   ├── __main__.py
    │   └── layers
    │       └── __main__.py
    ├── textbooks
    │   └── __main__.py
    └── timetable
        ├── utm
        │   └── __main__.py
        ├── utsc
        │   └── __main__.py
        └── utsg
            └── __init__.py

Its usage is documented here. It allows for command-line calling of functions but we don't use it in this case (everything is a class, no top-level functions or __main__). So it might me more appropriate to do the following:

uoftscrapers/
├── __init__.py
└── scrapers
    ├── utils
    │   ├── scraper.py
    │   └── layers.py
    ├── buildings.py
    ├── calendar
    │   └── utsg.py
    ├── coursefinder.py
    ├── exams
    │   ├── utm.py
    │   ├── utsc.py
    │   └── utsg.py
    ├── food.py
    ├── textbooks.py
    └── timetable
        ├── utm.py
        ├── utsc.py
        └── utsg.py

I'll ponder over this a little more, and welcome opinions.

kashav commented 8 years ago

I think I prefer the second, looks cleaner. What would import statements look like for the first?

Would it be something like: from scrapers.buildings.__main__ import Buildings or is there a nicer way to import classes from __main__.py files?

qasim commented 8 years ago

It takes the name of the folder, so imports would be the same (I think).