fossasia / open-event-server

The Open Event Organizer Server to Manage Events https://test-api.eventyay.com
https://api.eventyay.com
GNU General Public License v3.0
2.97k stars 1.89k forks source link

Research and propose best practices for storing files #1050

Closed mariobehling closed 8 years ago

mariobehling commented 8 years ago

Follow up to https://github.com/fossasia/open-event-orga-server/issues/1020

In the documentation about S3, it is mentioned that a solution to keep data distinct is to use subfolders with IDs. Is this the final solution? Please research how other services store data, answer open questions and propose implementations of storage capabilities.

mariobehling commented 8 years ago

@niranjan94 Could you please help to think this through and implement it according to best practices, e.g. any images, audio etc - all files should be stored on the server. We cannot depend on images that get called from other websites. This was already an issue for files that got imported and where added in wizard step 1. This is also a security risk and what if other sites are down? Please ensure a same-origin policy. This issue is the cause for a number of other open issues and questions and it is a very high priority by now.

@aviaryan I really wish you would have followed up on this in an earlier milestone.

niranjan94 commented 8 years ago

@mariobehling alright ... I'll research and give a proposal for implementation by tonight ...

niranjan94 commented 8 years ago

As of now, all the media that is to be added (logo, background image, slides, video, audio, sponsor logo, speaker image, etc) are being requested from the user as file and uploaded. We're not accepting file URLs as we cannot depend on external services.

The only place where we are accepting URLs right now are in the API and import. For that we could:

  1. Accept URLs. Verify if the file exists at the URL and then save else fail and throw an error to the user. We will download the file from the URL and upload it to S3 so that we don't have to depend on that URL anymore. (and/or)
  2. Let the user upload the file when he/she is making an API request and file will be saved in S3.

For import we could,

  1. Accept URLs. Verify if the file exists at the URL and then save else fail and throw an error to the user. We will download the file from the URL and upload it to S3 so that we don't have to depend on that URL anymore. (and/or)
  2. Have the user keep the files in the zip itself and give a path to the file in the URL field which we can use to save the file to S3.

What happens if files with the same data are downloaded?

I do not understand this. @mariobehling could you explain ?

What happens if we use the "copy" function to copy an event?

Suggested. the file must also be duplicated to reflect the new event ID. Current. The same URL of the previous event is used. ([BUG] as changes to original event's images would be reflected here too)

What happens if we want to use the same file e.g. a logo in different events?

I suggest it be handled as a different logo. Having the same logo references across multiple events would be more cumbersome.

An alternative would be, the organizer could have a central list of logos to which he/she can add/remove. And when creating an event, he/she can choose from one of the available logos.


As for handling large amounts of files and scaling, Amazon S3 is more than just capable. The current system of using ID name references like /event/1/background should be good enough. This way we are ensuring there are no multiple images present for one event (and also the URL will remain same even if an image is changed).

@mariobehling @aviaryan what do you think ?

aviaryan commented 8 years ago

@niranjan94 I agree with most of your points.

As of now, all the media that is to ..... The only place where we are accepting URLs right now are in the API and import. For that we could: ..... to save the file to S3.

I have also planned the same to be done in #1449 and have discussed it at https://github.com/fossasia/open-event-orga-server/issues/1454#issuecomment-231719079

What happens if files with the same data are downloaded?

I also don't understand this question.

What happens if we use the "copy" function to copy an event?

As @niranjan94 mentioned, yes there is a bug where old url of resources are still referenced from the new event. Also we need to copy parent event resources (logos, audios, images) to new urls in s3 so that both remain distinct. Solution: We could run the copy event task as background using celery. It will do the job of copying resources of parent_event to the new urls and then update the new_event resource urls to reflect the new urls. By resources, here I mean uploaded items like logo, audio, video etc.

What happens if we want to use the same file e.g. a logo in different events?

I don't think this feature is possible now and therefore current implementation of s3 has no problems with it.


I looked on the Internet for blogs, articles on projects using S3 for storage but wasn't able to find much help.

@niranjan94 @mariobehling One thing I have noticed is that services like github which use s3 for storage store the files at random keys i.e. something like -

bucket.s3.amazonaws.com/kasklfl23k24k24k5k244k35kks304130k34k032k423k03443k

One clear advantage is that it prevents users from scraping the bucket to get data (as there is no pattern). I think this is important.

The disadvantage is that it will make making sure that only distinct data is stored on the server difficult. The same is the advantage of the current implementation as @niranjan94 has mentioned (below)

This way we are ensuring there are no multiple images present for one event (and also the URL will remain same even if an image is changed).


I thought of a scheme that will allow giving us the best of both worlds. What we can do is use the current method of setting a KEY for an object eg. events/1/sessions/1/audio and then encrypt the KEY using app.secret_key as the key. Then add the encrypted string to the KEY. So events/1/sessions/1/audio becomes events/1/sessions/1/audio/<ENCRYPTED> where ENCRYPTED is encrypted form of events/1/sessions/1/audio . (ENCRYPTED can be base64 encoded to make sure it is url friendly) As app.secret_key is supposed to be secret so this will be safe from scrapers. Also it won't have the disadvantage of having problems with keeping only distinct data on the server. (as ENCRYPTED will be same for same KEY)

mariobehling commented 8 years ago

Yes, the app secret is a good idea. As part of this discussion we should also think about where else keys are needed. We can get inspiration from wordpress.

mariobehling commented 8 years ago

Just to think this through again: I think in S3 we can also limit the calls to files from a specific domain, can't we?

aviaryan commented 8 years ago

Just to think this through again: I think in S3 we can also limit the calls to files from a specific domain, can't we?

Yes in S3 we can make it so such that GET requests from certain domains are accepted and rest others are Forbidden (403). Btw, I don't understand why this will be useful.

aviaryan commented 8 years ago

@mariobehling @niranjan94 Can you please point out what needs to be done in this issue now ? Are we going with using app secret for encrypted urls as suggested in https://github.com/fossasia/open-event-orga-server/issues/1050#issuecomment-231822943 or something else.

niranjan94 commented 8 years ago

@aviaryan the encrypted (or maybe hashed) urls method sounds good.