bfansports / CloudTranscode

Distributed videos and images encoding/transcoding using Amazon SFN, FFMpeg and ImageMagic
GNU General Public License v2.0
293 stars 59 forks source link

Possible to use without S3 ? #58

Closed shubhank008 closed 7 years ago

shubhank008 commented 7 years ago

Can we use our own storage solution/server instead of S3 ?
I think S3 is okay for small use but for large sized files its storage/bandwidth costs outdo commercial encoding platforms or just manually encoding on your own server.

We create some 360VR videos and want to encode Raw 4k videos which would be 1-4GB~ in size.
I just ran the S3 calculator with a estimate of 50GB storage usage per month and around 500GBs bandwidth (rough upload/download) and costs came close to $50~ for S3 alone (which actually gets me a dedicated server at OVH)

I am not trying to bash the project or anything, instead I am really interested to actually use it but just the fear of AWS/S3 costs scares me away

shubhank008 commented 7 years ago

PS: Just to make sure I understood it correctly, if I have 1GB files and I have to encode 50 files in a month.
My total storage uses in a month will be 50GB and bandwidth used will be:

Website/PC to S3 = 50GB (free as upload to S3) S3 to transcodeNode = 50GB (charged) transcodeNode to S3 Output = 50GB (free as upload to S3) S3 to PC/server = 50GB (charged)

If I am correct, right now the script uses S3 to both import and export the files, right ?

koxon commented 7 years ago

Hello,

In order to transcode a file the source doesn't need to be in S3. If the file is served by a webserver using http or https, then you can reference the URL of your video and it will will be transcoded on the fly.

However, today, the resulting files are uploaded to AWS S3. If you look in the SFN-Migration branch and at the file TranscodeAssetActivity.php, there is a function called uploadResutlFiles. This is where the transcoded files in the TMP folder on the local machine are uploaded to S3.

You can work on this function to prevent the move to S3 and instead take the files from /tmp and move them somewhere else locally for example.

Ideally, we should make this an interface and let the users implement their own upload classes with still a default implementation that uploads to S3.

This is a good idea. Would you be able to do it?

You should start using the newest version of CloudTranscode located in the SFN-migration branch. The master branch will soon be decommissioned. The new version is VERY easy to use.

Thanks nicolas

shubhank008 commented 7 years ago

Hey, Thanks for quick reply. I just checked the SFN branch and read you can use HTTP as "input/download" source, glad to know that. Using S3 just for output ain't that bad anymore but yes, giving more freedom of output location is always better.

I will setup CPE and CT on my server today to test it out, if it works well, I will look into customizing uploadResultFiles. I 90% of the time code for myself so can't guarantee the code will be pretty, but at the very least can add some functions/code to upload to FTP or other sources, which you can implement in your workflow/YAML (which TBH I still have no idea how it works).

Question:

If the file is served by a webserver using http or https, then you can reference the URL of your video and it will will be transcoded on the fly.
The server will still download the file (by itself ?) to transcode it, yes ?

Please excuse any further questions I might have related to installation/setup, my expertise mostly lies in linux/servers, PHP (intermediate), iOS/Swift and game development. Just a Jack of all trades but master of none.

koxon commented 7 years ago

FFMpeg downloads the file from http and transcode it as it receives it. Not sure if it stores any temporary data.

Concerning the code, you need to use the SFN-migration branch. It doesn't require SWF, nor CPE. You can just start your transcoding activity and get started.

You will need to create a workflow in AWS Step Functions, see the example in the state_machine folder. You will need to edit the workflow to reference the SFN Tasks that you created in the console.

Then you start your validate and transcode activities and reference the ARN of the associated task in SFN. I suggest you read about Step functions: http://docs.aws.amazon.com/step-functions/latest/dg/welcome.html

Hope this help nicolas

On Tue, Jan 24, 2017 at 9:08 AM, shubhank008 notifications@github.com wrote:

Hey, Thanks for quick reply. I just checked the SFN branch and read you can use HTTP as "input/download" source, glad to know that. Using S3 just for output ain't that bad anymore but yes, giving more freedom of output location is always better.

I will setup CPE and CT on my server today to test it out, if it works well, I will look into customizing uploadResultFiles. I 90% of the time code for myself so can't guarantee the code will be pretty, but at the very least can add some functions/code to upload to FTP or other sources, which you can implement in your workflow/YAML (which TBH I still have no idea how it works).

Question:

If the file is served by a webserver using http or https, then you can reference the URL of your video and it will will be transcoded on the fly. The server will still download the file (by itself ?) to transcode it, yes ?

Please excuse any further questions I might have related to installation/setup, my expertise mostly lies in linux/servers, PHP (intermediate), iOS/Swift and game development. Just a Jack of all trades but master of none.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/sportarchive/CloudTranscode/issues/58#issuecomment-274735691, or mute the thread https://github.com/notifications/unsubscribe-auth/ADCRIhNmNteQlIoZSh3Rv9xY9HtAbliqks5rVbF8gaJpZM4LrmfV .

-- Nicolas Menciere Co-founder and CTO https://www.bfansports.com http://bfansports.com

shubhank008 commented 7 years ago

The new branch is really way way better than before, easier to setup too
I was earlier trying to setup with CPE to see how it works (and franky rely less and less on AWS and have more control on my own server) but had to give up after Python/decider errors (was doing non-docker so building/installing everything myself)

Just changed to new branch and almost about to kickstart my first transcode, just fixing some hard-coded locations to not rely on composer

A small suggestion though, you should really create a Table of Contents or rename "State Machine" section to "Installation" or "Geting Started"
I had read till How it Works and State Machine but didn't saw "Run Activities" are followup steps, and was actually trying to setup Lambda (as I heard that word in master branch's Readme)

I will try to fork a Readme/Instructions, see if you like it :)

Will try to contribute however I can by adding some different output endpoints and more ffmpeg configuration spec (on that matter, how much can the input_spec be customized already for ffmpeg parameters)

koxon commented 7 years ago

Cool, feel free to fork and send pull request with updates on README. This documentation: http://blog.bfansports.com/CloudTranscode/specs/output.html is not yet up to date with the latest SFN changes but the JSON format doesn't change.

Concerning the input_spec, the format is limited.

However, in order to support complex ffmpeg commands, you can craft and pass on your own FFmpeg command in the input JSON instead. Like that

{... "output_asset": { "type": "VIDEO", "bucket": "ClientA-bucket-out", "file": "/output1/video1.mp4", "s3_rrs": true, "s3_encrypt": true, "custom_cmd": "ffmpeg -i ${input_file} -c:v libx264 -preset slow -crf 22 -c:a copy ${watermark_options} ${output_file}", "watermark": { "bucket": "ClientA-bucket-in", "file": "/no-text-96px.png", "size": "96:96", "opacity": 0.2, "x": -20, "y": -20 } } }

Concerning SFN and the activities. Activities are your own standalone script running on your server. Lambda functions run in AWS but are very limited. 5min exec time max, 500Mo storage max, etc. So not adapted at all to transcoding large assets

On Tue, Jan 24, 2017 at 12:36 PM, shubhank008 notifications@github.com wrote:

The new branch is really way way better than before, easier to setup too I was earlier trying to setup with CPE to see how it works (and franky rely less and less on AWS and have more control on my own server) but had to give up after Python/decider errors (was doing non-docker so building/installing everything myself)

Just changed to new branch and almost about to kickstart my first transcode, just fixing some hard-coded locations to not rely on composer

A small suggestion though, you should really create a Table of Contents or rename "State Machine" section to "Installation" or "Geting Started" I had read till How it Works and State Machine but didn't saw "Run Activities" are followup steps, and was actually trying to setup Lambda (as I heard that word in master branch's Readme) I will try to fork a Readme/Instructions, see if you like it :)

Will try to contribute however I can by adding some different output endpoints and more ffmpeg configuration spec (on that matter, how much can the input_spec be customized already for ffmpeg parameters)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/sportarchive/CloudTranscode/issues/58#issuecomment-274780556, or mute the thread https://github.com/notifications/unsubscribe-auth/ADCRIoppzlVnvf_U7aNi68OBLMfmiTdwks5rVeI1gaJpZM4LrmfV .

-- Nicolas Menciere Co-founder and CTO https://www.bfansports.com http://bfansports.com

shubhank008 commented 7 years ago

Okay I think custom_cmd would work for now, can take inputs from user on my client side, create a ffmpeg command and send it in input_spec for now
Will look into customizing the command in input_spec as well once I have gotten used to how things work.

Btw, I am not using CPE at all now, so when running ValidateAsset worker, I get this error Set 'AWS_DEFAULT_REGION' environment variable!

Whats the proper way now to set the ENV without hardcoding it or using CPE (which I assume is not at all needed on worker servers)

shubhank008 commented 7 years ago

Actually instead of bugging you again and again, please let me also ask where do we setup/configure AWS credentials now ?

PS: I have setup ENV using export AWS_DEFAULT_REGION='eu-west-1' for now, will most probably create a bash script to automate this

shubhank008 commented 7 years ago

Getting the following error on ValidateAssetActivity

screenshot_1

I am using HTTP as my file source and removed bucket and file parameters from input_asset in my JSON

{ "input_asset": { "type": "VIDEO", "http": "http://62.210.141.215/Imouto%20to%20Sono%20Yuujin%20ga%20Ero%20Sugite%20Ore%20no%20Kokan%20ga%20Yabai%2002%20720p.mp4" }, "output_assets": [ { "type": "THUMB", "mode": "snapshot", "bucket": "cloud-encode", "path": "/output/", "file": "thumbnail_sd.jpg", "s3_rrs": true, "s3_encrypt": true, "size": "-1:159", "snapshot_sec": 5 }, { "type": "THUMB", "mode": "snapshot", "bucket": "cloud-encode", "path": "/output/", "file": "thumbnail_hd.jpg", "s3_rrs": true, "s3_encrypt": true, "size": "-1:720", "snapshot_sec": 5 }, { "type": "VIDEO", "bucket": "cloud-encode", "file": "/output/video1.mp4", "s3_rrs": true, "s3_encrypt": true, "keep_ratio": false, "no_enlarge": false, "preset": "720p-generic" } ] }

shubhank008 commented 7 years ago

^ Above error occurs if I use HTTP only (whether my own server or S3 HTTP), but works if I remove HTTP and instead use Bucket and File parameters

sportarc commented 7 years ago

I will take a look at this. This is still an ongoing branch so this may be buggy. I will fix.

Concerning the credentials, follow the AWS documentation about setting up your environment and AWS CLI.

On Tue, Jan 24, 2017 at 2:10 PM, shubhank008 notifications@github.com wrote:

^ Above error occurs if I use HTTP only (whether my own server or S3 HTTP), but works if I remove HTTP and instead use Bucket and File parameters

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/sportarchive/CloudTranscode/issues/58#issuecomment-274798548, or mute the thread https://github.com/notifications/unsubscribe-auth/AGIIF-vtjo_E46nRXkiAAIjBQjjG85edks5rVfgtgaJpZM4LrmfV .

-- Nicolas Menciere Co-founder and CTO https://www.bfansports.com http://bfansports.com

shubhank008 commented 7 years ago

Yeah got ENV to work by setting export AWS_DEFAULT_REGION='eu-west-1' and also by creating a credentials file in ~/.aws/credentials

Just to update the bug, the worker works if I put both Bucket/File and HTTP in my JSON and it will only use HTTP as my input (ignore bucket from input)
Yet if I leave Bucket/File values empty or remove them from my JSON spec, the script crashes/outputs those errors

koxon commented 7 years ago

Thanks for the info. that should be an easy fix

shubhank008 commented 7 years ago

Just checking in to see if this has been fixed in - https://github.com/sportarchive/CloudTranscode/commit/c743b3548ed584dc21158ae0d51fe0ccb263d841

In a earlier comment you stated that the code that handles file output (to S3 bucket for now) can be customized in TranscodeAssetActivity.php
If I want to add options/type of output in the JSON schema, is TranscodeAcitvity the only file I need to make changes in ?

PS: Trying to add a FTP endpoint

koxon commented 7 years ago

The http / bucket/key issue is now fixed. You can specify either or. If using http, we now use Curl to pull 1024 bytes of the files in order to analyse the mime type.

Yes you need to edit the TranscodeAssetActivity.php file. $this->uploadResultFiles($task, $output);

should be fairly easy