Supporting alternatives to Google Drive

fklemme commented 3 years ago

While Google Drive is a nice interface in a distributed setup, for the time being, it would be helpful to have some alternative for the local execution of the web application. Without a collection of Google Drives available, it would be helpful if own files could be offered easily without the need for a (large, paid) Google Drive. I could imagine having a local folder as an additional source of images or even some other web-space. While I would like to help and contribute in developing such a feature, I think this topic requires the expertise of @ndepaola to judge what is doable and reasonable. Maybe we can do some brainstorming here. I was thinking of a few possibilities:

Support a local drive. This seems to be the most obvious solution but might be tricky to implement. First, Django needs to be aware of the folder to scan it, then it needs to be served via static files to be displayed. And, last but not least, it would need to be added to the order.csv in a way that the files can be copied by the autofill.exe for MPC upload.
Support a common website directory listing, e.g., as generated by Apache (see https://kruecke.net/files/, for example). Then, a local folder (or any other source) could be published using a simple Apache. Django would need to be able to search the directory listing and index all images. Since the files are downloadable via URL, this might be more portable and less error-prone in configuration, also for the autofill client.
Based on the previous, I could imagine building a small php script (or even a small django app) for the server-side to do the indexing and offer all information needed by MPCAutofill directly as JSON or similar. This could simplify the changes to MPCAutofill and only a small extra installation on the offering website is needed.

Anyway, I think all methods would require to add an additional field to the drives.csv and go from there.

What do you think? Is this something worth targeting? Doable in reasonable effort? Or is this rather a different project altogether?

avafloww commented 3 years ago

This is actually something I was investigating/starting to work on setting up myself, at least for #2 (or a variation of it - i.e. using nginx autoindex_format json output). I'll poke around at it later and maybe toss some thoughts here; putting this comment here to remind myself 😄

ndepaola commented 3 years ago

i agree & think that local file support is the future of this project - was planning on having a crack at this week.

imo the way to do this while minimising structural changes to the project is to index local files within the current three elasticsearch indexes, which means modifying those ORM tables to support local files and gdrive files. it might also make sense to have a user interface for linking folders to mpc autofill and a second database updater script which crawls local files, accessible with a button press rather than requiring the command line. in that instance, we'd want a django for local mode vs hosted and only enable local file support in local mode.

google drive is baked into mpc autofill fairly deeply so this'd involve a bit of frontend work to handle local files differently to gdrive images as well.

storing local file paths in xml and using them when autofilling should be fairly straightforward.

thoughts?

fklemme commented 3 years ago

One challenge (or opportunity) that comes with local files is the docker setup. All docker applications run in an own virtual filesystem that is decoupled from the real system filesystem that the user will see. To have a local folder visible to dango/nginx, the user will have to mount it before starting docker, looking maybe something like this: -v D:\My Proxies:/cards. So, the local folder D:\My Proxies will be visible as /cards to django/nginx. (ofc, we can make a config file or something for the path). The downside of this: The path will look differently for django and for the user. The upside: The path will always be the same (/cards/...), no matter what directory the user will mount. Maybe we can leverage this! The biggest challenges that comes along is that if django only sees /cards/otto/swamp.jpg, what do we put in the orders.xml? A simple solution could be to also serve /cards with nginx and then have the path become http://localhost:8000/cards/otto/swamp.jpg and the autofill client will just download the card as usual. Maybe this is the most portable approach. This would even allow this to work on a hosted situation where the host wants to offer local files as well.

In general, I would prefer a solution where the user is able to use Google Drive and local files at the same time, if they like. That wouldn't be a problem, would it?

Last but not least it would be greate if the Google Service Account (client_secrets.json) is only required if Google Drives are configured in drives.csv. Not too familiar with django and the code, so maybe this is already the case.

ndepaola commented 3 years ago

that's a very good point - i dig the idea of optionally pointing the docker image at a single directory to index (and we could set this up such that the docker config modifies a django setting, meaning the project is still usable without docker). it may also be possible to run the autofill script from within the docker image, eliminating the need for downloading the images from the docker image back to the host file system.

definitely planning on retaining google drive functionality alongside local file support!

re: client_secrets - the file is only required by the drive crawler management script (update_database.py). once drive files are indexed, the frontend retrieves thumbnail images and supports downloading the full res images without needing to authenticate with google.

fklemme commented 3 years ago

Having one folder configured in an env might be just it. Django can read that, docker-compose can read that (to then mount and re-route django to, e.g., /cards. Still, it would be good if the serve the images trough staticfiles for portability rather than letting the HTML point to file on the local filesystem. I'm not 100% sure to do this best in a development setup, though.

Oh man, I never even considered putting the client into docker as well. :heart_eyes: Got to look into that. The tricky part would be that docker doesn't offer a GUI natively. So maybe we need the user to enter MPC credentials, run Chrome headless and store the uploaded project (as offered by MPC). Otherwise the user might need something like VNC client to display the Chrome running in Docker. But I will have a look what options are available.

ljrobison commented 3 years ago

I am only a novice self taught programmer so I don't know much of anything I can offer to help, but if you guys could local folder support, or even a network drive.. that would be absolutely amazing. Now I just need to figure out how to get this running as a docker on my unRAID server haha.

avafloww commented 3 years ago

Refactored the autofill client to support both drive and regular HTTP links (video link) - now to implement support for it into the site! 😄

(Also, the URLs shown in the xml file in the video are internal to my LAN only, so don't bother typing them out)

(edit, in case the URL above is broken because thanks GitHub: https://www.youtube.com/watch?v=piI_EMZVgZs)

ndepaola commented 3 years ago

in terms of the local tool, i think the way to go will be storing local files' paths in the <id> tag in xml - it'd be difficult to determine whether an image is from google drive or is locally stored from xml without changing the xml schema. adding support for this in my local tool rewrite branch https://github.com/chilli-axe/mpc-autofill/tree/local-tool-rewrite remote isn't up to date atm but i'll push my changes shortly. e.g. this is working for me:

<order>
    <details>
        <quantity>12</quantity>
        <bracket>18</bracket>
        <stock>(S30) Standard Smooth</stock>
        <foil>false</foil>
    </details>
    <fronts>
        <card>
            <id>G:\Google Drive\Chilli_Axe's MTG Renders\0. White\Academy Rector.png</id>
            <slots>1,2,0</slots>
            <name>Academy Rector.png</name>
            <query>academy rector</query>
        </card>
        <card>
            <id>G:\Google Drive\Chilli_Axe's MTG Renders\1. Blue\1. Search for Azcanta.png</id>
            <slots>3,4</slots>
            <name>Search for Azcanta.png</name>
            <query>search for azcanta</query>
        </card>
        <card>
            <id>G:\Google Drive\Chilli_Axe's MTG Renders\6. Colourless\All Is Dust (Secret Lair).png</id>
            <slots>5,6,7,8,9,10,11</slots>
            <name>All Is Dust (Secret Lair).png</name>
            <query>all is dust</query>
        </card>
    </fronts>
    <backs>
        <card>
            <id>G:\Google Drive\Chilli_Axe's MTG Renders\1. Blue\1. Azcanta, the Sunken Ruin.png</id>
            <slots>3,4</slots>
            <name>Azcanta, the Sunken Ruin.png</name>
            <query>azcanta sunken ruin</query>
        </card>
    </backs>
    <cardback>G:\Google Drive\Chilli_Axe's MTG Renders\12. Cardbacks\Black Lotus.png</cardback>
</order>

ndepaola commented 3 years ago

(Also, the URLs shown in the xml file in the video are internal to my LAN only, so don't bother typing them out)

(edit, in case the URL above is broken because thanks GitHub: https://www.youtube.com/watch?v=piI_EMZVgZs)

nice work! the local tool in master is a truly horrible piece of code so i'm sorry about that but i'm improving i swear 😅

fklemme commented 3 years ago

    <card>
        <id>G:\Google Drive\Chilli_Axe's MTG Renders\0. White\Academy Rector.png</id>
        <slots>1,2,0</slots>
        <name>Academy Rector.png</name>
        <query>academy rector</query>
    </card>

Will the client still "download" the files to a cards sub-folder? Please keep in mind that in practice, people might have deep folder structures containing duplicate filenames. Thus, some kind of ID should still be appended to the filename. (Maybe just a hash, in case of path?) Will you be extending the schema in your rewrite branch? (didn't quite get that)

ndepaola commented 3 years ago

in my branch, the logic is now:

parsing and validation step:

if the contents of the <id> tag points to a valid file, use this as the image's file path
otherwise, if the image doesn't have a file name from xml (e.g. the common cardback), the image's file name is retrieved from google scripts api
then, the system uses the image's file name and looks in /cards to see if the image exists without the drive ID in parentheses - this has the potential to cause file name collisions, but is what allows the tool to work when using the Download All button in the web app and moving those files to /cards - if a file exists at that path, use this as the image's path
finally, set the image's path to point at /cards using the file's gdrive ID in parentheses

downloader threads:

if no file exists at the file's path, attempt to download it from google drive based on the image's ID
if that fails, report on being unable to download the image and skip inserting it

so local files are uploaded to mpc directly from the paths they already exist at (they aren't copied to /cards)

not sure if i explained that clearly sorry but hopefully it makes sense!

edit: i had to squash some commits bc my local git is a bit cooked (painful to use gitkraken and a private github email address) but this is up to date now https://github.com/chilli-axe/mpc-autofill/tree/local-tool-rewrite/autofill not done yet but most of the way there

fklemme commented 3 years ago

Sorry, I'm not that familiar with the codebase: Wouldn't it be worth the effort to add another optional tag (e.g., path) to be explicit and avoid future confusions? Would that actually require many changes?

ndepaola commented 3 years ago

you're probably right yea - i suppose it's not a problem if some cards don't have a <path> tag! will think about it more but i'll probably end up doing this

one reason it's slightly nicer to do this w/ the id tag is the common cardback (stupidly, i shouldn't have designed the xml schema this way) only has a single text field, and it's more consistent to assign that text to drive_id and go from there - it's probably a bad move to make a breaking change to the schema at this point

fklemme commented 3 years ago

I appreciate your consideration of backwards compatibility. However, now that you're rewriting the client anyway would be the best time to fix these things. Also, I believe most people also won't get into trouble with a change because they can a) still grab an older release or b) will use your web application anyway, in which we can apply the same changes. So I won't be too defensive when aiming for a good and future-oriented change. :)

I'm also working on some ideas to bundle the client with docker. This might further reduce the chance of picking incompatible tool versions. I might come back to you with some Django-related questions, because I'm considering to add a button to launch the client directly from the web interface, and I'm not too familiar with Django / web development.

avafloww commented 3 years ago

Oh dear, looks like there's several cooks in the kitchen on this 😅

I'll have a look later at adding HTTP download support to the rewritten autofill tool. I've also started on abstracting sources to allow for different parameters for different source types (like drive id and drive link, or URL, or local file path) on the web side of things, but I'm not 100% sure I'm going about it the right way yet. I'll try to get some code up within the next few days if you wanted to take a look!

chilli-axe commented 3 years ago

Screenshot (45)

getting there with this feature! hoping to have a pr up before too long https://github.com/chilli-axe/mpc-autofill/tree/local-file-support

fklemme commented 3 years ago

Wow, that's a lot of new code! :smile: Just one question so far. With adding local files to the static files like this:

LOCAL_FILE_INDEX = r""  # for example: r"C:\Users\John Doe\Desktop\MPC Cards"
# [...]
STATICFILES_DIRS = [
    os.path.normpath(os.path.join(BASE_DIR, "cardpicker/static")),
    os.path.normpath(LOCAL_FILE_INDEX),
]

Does this mean, when I call python3 manage.py collectstatic, all cards will be copied?

chilli-axe commented 3 years ago

yes - it works fine in development but will require some more thought for dockerising since you're serving static files with nginx. a few ideas:

ideally, we want collectstatic to collect all files that aren't in LOCAL_FILE_INDEX to the django static directory, then nginx should serve that directory as well as LOCAL_FILE_INDEX on /static - might that be possible? for what it's worth, collectstatic has an optional argument to ignore files matching a pattern but i can't seem to force it to ignore my local file index.
it might make sense to configure LOCAL_FILE_INDEX in a text file in the base directory so you don't need to modify django settings to configure the local index - this might open up the possibility of copying the value from the configuration file into django settings after running collectstatic?
lastly, the update_database script could create low resolution thumbnails of all images in LOCAL_FILE_INDEX and django could serve these rather than the full res images, then copying the thumbnails with collectstatic would be less costly than copying the full resolution images. might still require a non-trivial amount of storage space though and i haven't tested how long creating thumbnails in this way might take.

open to any suggestions on this!

fklemme commented 3 years ago

I was also thinking of the first option intuitively. It should be very easy to implement in Docker, so we should just try this one first. The second option would work just as well but I don't think it's necessary. The third option sounds interesting for performance reasons. But we should first try the simple way before we blindly optimize for something that might not be necessary in the first place.

I will check out the branch soon and give it a try. Then I'll also see if there are other things that we'll need to consider.

Incubatio commented 3 years ago

Hey there, thx for working on this guys !

Just a small input here, (but I might be too late to the party): I would recommend to having 2 steps process: step 1: install and deploy MPCAutoFill server with an "empty image database" step 2: Run program / script to Scrap image_urls and fill the database from a third party (either google drive, static-file-server, etc ... )

The main benefits are:

users can choose one or several solutions to get the cards from (by doing step 2 one or several time).
keep MPCAutofill and each image scrapper code isolated and as simple as possible,
dispatch the different contributors on different git repositories (less friction on collaboration)
If someone wants to add a new third party, they can just add a new scrapper in a new git repo.

The only downside is that if someone update files on the third party, scrapper need to be re-ran.

For static file serving, as you guys are already using docker, I would recommend using something like this: https://hub.docker.com/r/halverneus/static-file-server

rsullivan00 commented 2 years ago

For people that already have directories with only the images they want to use, and don't want to have to use the web interface, I hacked together a script. It can load front images from one directory, and back images from another.

It does require some editing, and manually setting the slot of backs of cards after the XML is generated.

https://gist.github.com/rsullivan00/df968d764101a84244b4b1a06caecf79

chilli-axe / mpc-autofill

Supporting alternatives to Google Drive #39