FinalsClub / karmaworld

KarmaNotes.org v3.0
GNU Affero General Public License v3.0
7 stars 6 forks source link

Upload HTML directly to S3 bucket, do not dump in database #273

Closed btbonval closed 10 years ago

btbonval commented 10 years ago

The S3 static buckets for beta and prod now have an 'html' directory in the base.

When we have HTML, instead of storing it into the database, we want to write it as a file onto S3 (possibly name the file by hash). Instead of storing the html in the database, we want to store the relative static path into something like Note.static_relpath.

The Note detail template would then use an IFRAME pointing at {{ STATIC_URL }}html/{{Note.static_relpath}}.

btbonval commented 10 years ago

We will also need to post-process the HTML already in Production and on the VM and push that out to S3.

This will be a one time deal rather than a recurring thing, so a quick script that doesn't need to hang around should suffice.

btbonval commented 10 years ago

Did a quick search to see how out of fashion IFRAMEs are. Found this question about IFRAME and SEO. Being that SEO is a pretty recent topic, there is a good comment in here: http://productforums.google.com/forum/#!topic/webmasters/Y6DyIR7wLXg

Make sure there is an anchor link to IFRAME content on the page with the IFRAME. That sounds like good practice anyway, in case someone turns off IFRAMEs because they're so 1995.

btbonval commented 10 years ago

sanitize_html parses html in-place on the model. e.g. it loads self.html and saves self.html. We probably want to change this into a filter.

btbonval commented 10 years ago

We probably won't need to batch process HTML across notes, and if we do, the current function will need to be rewritten anyway. Should remove this: https://github.com/FinalsClub/karmaworld/blob/b7ebe2b1d390232a16618977fb3b19cfa790f7b9/karmaworld/apps/notes/management/commands/process_note_html.py

beautifulsoup is part of the requirements. lxml does one thing, which is in sanitize_html. I have to rewrite sanitize_html to be a filter anyway, so if I replace lxml, the world will be a better, brighter place.

btbonval commented 10 years ago

No need to store a URL for the HTML snippet. Note.slug is supposed to be unique. I'm adding unique, not-null to Document.slug which will inherit to Note.slug. The static S3 filename will be based on the Note slug.

btbonval commented 10 years ago

was trying from django.core.files.storage import default_storage to write files.

unfortunately:

>>> default_storage.bucket_acl
'public-read'

Our static_s3.py configs are for a read-only API interface, which means there won't be uploading? How the heck does collectstatic work if it can't actually write to the S3 bucket using the static_s3.py settings?

I am missing something key.

btbonval commented 10 years ago

I was trying to create a file by simply opening it and writing to it, as per http://django-storages.readthedocs.org/en/latest/backends/amazon-S3.html#storage

That gives me an IOError even though open is set to write/create mode:

>>> somefile = default_storage.open('bryantestfile.html', 'w')
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/var/www/karmaworld/venv/local/lib/python2.7/site-packages/django/core/files/storage.py", line 33, in open
    return self._open(name, mode)
  File "/var/www/karmaworld/venv/local/lib/python2.7/site-packages/storages/backends/s3boto.py", line 177, in _open
    raise IOError('File does not exist: %s' % name)
IOError: File does not exist: bryantestfile.html

Maybe Django sets the default_storage to read-only mode for static hosting reasons, but switches it for collectstatic. Clearly the bucket has everything it needs:

>>> default_storage.bucket.get_acl()
<Policy: Andrew (owner) = FULL_CONTROL>
btbonval commented 10 years ago

Nothing in docs about default_storage.acl. Nothing about ACL in django-storages. Only thing about ACL is in s3boto, but we can see that bucket ACL from s3boto is just fine.

guess who has two thumbs and has to read source code. this guy. nn/ \nn

btbonval commented 10 years ago

http://tartarus.org/james/diary/2013/07/18/fun-with-django-storage-backends

btbonval commented 10 years ago
>>> import storages.backends.s3boto
>>> protected_storage = storages.backends.s3boto.S3BotoStorage(acl='private')
>>> with protected_storage.open('html/bryantest.html', 'w') as s3file:
...     s3file.write(html)
... 
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/var/www/karmaworld/venv/local/lib/python2.7/site-packages/django/core/files/storage.py", line 33, in open
    return self._open(name, mode)
  File "/var/www/karmaworld/venv/local/lib/python2.7/site-packages/storages/backends/s3boto.py", line 177, in _open
    raise IOError('File does not exist: %s' % name)
IOError: File does not exist: html/bryantest.html
>>> protected_storage.acl
'private'
>>> protected_storage.bucket_acl
'public-read'
>>> protected_storage.bucket.get_acl()
<Policy: Andrew (owner) = FULL_CONTROL>

DO NOT WANT

btbonval commented 10 years ago

http://www.laurii.info/2013/05/improve-s3boto-djangostorages-performance-custom-settings/

btbonval commented 10 years ago

According to the above link, the acl comes from here: http://docs.aws.amazon.com/AmazonS3/latest/dev/ACLOverview.html

'public-read' should still give the owner full control, but the allusers group gets read.

It would seem like a bad idea to change the S3 ACL from 'public-read'. Not sure how to access this S3boto stuff as the owner.

btbonval commented 10 years ago

Files are called Keys in the raw s3boto bucket. e.g. default_storage.bucket.get_key('img/asc.gif'). new_key() creates a theoretical file on the S3 bucket. Key.open*() commands don't work, which would be nice for writing directly to the S3 file. Key.send_file() does work. Wrap up the HTML in a little StringIO file-like object and BAM, I just uploaded to S3.

Tested and confirmed. Ugly as junk.

>>> flo = StringIO(html)
>>> nk = default_storage.bucket.new_key('html/bryantest.html')
>>> nk.exists()
False
>>> nk.send_file(flo)
>>> nk.exists()
True
>>> with default_storage.open('html/bryantest.html', 'r') as s3file:
...     print s3file.read()
... 

<html>
<body>
<a href="whaaaat">the</a>
<a href="test" target="_blank">
woop
</a>
<a href="nope" target="werrird">wa</a>
</body>
</html>
btbonval commented 10 years ago

Most of the code is written now. I tried to kick off a process to convert HTML in the database to files on S3, but failed:

(venv)vagrant@vagrant-ubuntu-precise-32:~/karmaworld$ python manage.py populate_s3
Traceback (most recent call last):
...
  File "/home/vagrant/karmaworld/karmaworld/apps/notes/management/commands/populate_s3.py", line 42, in handle
    htmlflo = StringIO(note.html)
UnicodeEncodeError: 'ascii' codec can't encode character u'\ue001' in position 10111407: ordinal not in range(128)

"The StringIO object can accept either Unicode or 8-bit strings, but mixing the two may take some care. If both are used, 8-bit strings that cannot be interpreted as 7-bit ASCII (that use the 8th bit) will cause a UnicodeError to be raised when getvalue() is called." http://docs.python.org/2/library/stringio.html

might as well pass the HTML into BeautifulSoup to see if it can read in the data and output it in consistent UTF-8.

btbonval commented 10 years ago

liar liar pants on fire. It turns out BeautifulSoup does not output UTF-8 by default even though all the docs say it does. Gotta run soup.prettify("utf-8") and suddenly StreamIO is pleased.

btbonval commented 10 years ago

oh good. random disconnection errors or something. More or less exactly what I want to deal with right now.

(venv)vagrant@vagrant-ubuntu-precise-32:~/karmaworld$ python manage.py populate_s3
Processing html/mit6_007s11_lec07pdf.html
Traceback (most recent call last):
...
  File "/home/vagrant/karmaworld/karmaworld/apps/notes/management/commands/populate_s3.py", line 48, in handle
    newkey.send_file(htmlflo)
  File "/var/www/karmaworld/venv/local/lib/python2.7/site-packages/boto/connection.py", line 910, in make_request
    return self._mexe(http_request, sender, override_num_retries)
  File "/var/www/karmaworld/venv/local/lib/python2.7/site-packages/boto/connection.py", line 872, in _mexe
    raise e
socket.error: [Errno 32] Broken pipe
btbonval commented 10 years ago

well I guess I won't be running this overnight to process.

Can't test if anything worked until I get one Note onto S3 to see if my VM hosts it properly. Can't get one Note onto S3 because broken pipe.

Pushing WIP to origin as feature_html_on_s3 with commit HEAD 87bf8e2441fe35cbdac8cae713f8361557ab8275

btbonval commented 10 years ago

rebased master into branch and ran tests.

... still running.

still running?

btbonval commented 10 years ago

top says the CPU is mostly running SSHD and top. tests deadlocked?

btbonval commented 10 years ago

Looks like the manage.py tests are stuck running Xvfb, which is in turn not running anything (although it should run firefox). Time to double check master still works.

vagrant@vagrant-ubuntu-precise-32:~$ ps ax | grep python
 3219 pts/1    S+     0:02 python manage.py test
 3286 pts/0    S+     0:00 grep --color=auto python
vagrant@vagrant-ubuntu-precise-32:~$ pstree -p | grep -C 3 3219
        |-rsyslogd(828)-+-{rsyslogd}(837)
        |               |-{rsyslogd}(838)
        |               `-{rsyslogd}(839)
        |-sshd(799)-+-sshd(1158)---sshd(1244)---bash(1245)---python(3219)---Xvfb(3242)
        |           `-sshd(2078)---sshd(2164)---bash(2165)-+-grep(3289)
        |                                                  `-pstree(3288)
        |-udevd(323)-+-udevd(399)
btbonval commented 10 years ago

Tests completed on master branch in ~4 minutes.

Something tripped up feature_html_on_s3 branch so that tests deadlock :( No backtraces to help.

btbonval commented 10 years ago

python manage.py test -v 2 seems to be giving better output. Looks to be hungup on Evernote.

Test searching for a school by partial name ... ok
Test upload of an Evernote note ...

Same pstree as before with the dangling Xvfb. Definitely stuck here.

Code: https://github.com/FinalsClub/karmaworld/blob/fe3879edc21a599b51f594e653d4da2adb5f6f88/karmaworld/apps/document_upload/tests.py#L47-L53 calls https://github.com/FinalsClub/karmaworld/blob/fe3879edc21a599b51f594e653d4da2adb5f6f88/karmaworld/apps/document_upload/tests.py#L30-L37

Only place I can imagine it hanging is on convert_raw_document?

btbonval commented 10 years ago

The feature_html_on_s3 branch has no changes in the raw_document app.

btbonval commented 10 years ago

Double ctrl-c got a super long backtrace!

Test upload of an Evernote note ... ^C^CTraceback (most recent call last):
  File "manage.py", line 14, in <module>
    execute_from_command_line(sys.argv)
...
  File "/var/www/karmaworld/venv/local/lib/python2.7/site-packages/django/core/management/base.py", line 255, in execute
    output = self.handle(*args, **options)
  File "/var/www/karmaworld/venv/local/lib/python2.7/site-packages/south/management/commands/test.py", line 8, in handle
    super(Command, self).handle(*args, **kwargs)
  File "/var/www/karmaworld/venv/local/lib/python2.7/site-packages/django/core/management/commands/test.py", line 89, in handle
    failures = test_runner.run_tests(test_labels)
  File "/var/www/karmaworld/venv/local/lib/python2.7/site-packages/django_nose/runner.py", line 155, in run_tests
    result = self.run_suite(nose_argv)
...
  File "/usr/lib/python2.7/unittest/case.py", line 327, in run
    testMethod()
  File "/home/vagrant/karmaworld/karmaworld/apps/document_upload/tests.py", line 53, in testEvernoteConversion
    'mimetype': 'text/enml'})
  File "/home/vagrant/karmaworld/karmaworld/apps/document_upload/tests.py", line 36, in doConversionForPost
    convert_raw_document(raw_document, user=user, session_key=session_key)
  File "/home/vagrant/karmaworld/karmaworld/apps/notes/gdrive.py", line 244, in convert_raw_document
    newkey.send_file(htmlflo)
  File "/var/www/karmaworld/venv/local/lib/python2.7/site-packages/boto/s3/key.py", line 727, in send_file
    query_args=query_args)

Ahh that'd certainly be unique to this branch. Hanging on direct upload to S3. The html folder on the appropriate S3 is empty. Guess I'll play with this feature a little more, it's still leaving cake on the toothpick.

btbonval commented 10 years ago

note for later: It seems worth moving this one function for uploading to S3 from gdrive.py into Note.

btbonval commented 10 years ago

Testing a PDF that rends to 2.87 MiB of HTML using (mostly) what would be performed right now. Upload seems to do zilch.

In [7]: rds = RawDocument.objects.all()
In [14]: fp_file = rds[1].get_file()
In [19]: html = pdf2html(fp_file.read())
Preprocessing: 88/88
Working: 88/88
In [20]: len(html)
Out[20]: 3012503
In [21]: fhtml = notes[0].filter_html(html)
In [22]: len(fhtml)
Out[22]: 3365756
In [23]: filepath = notes[0].get_relative_s3_path()
In [24]: filepath
Out[24]: 'html/certificate-path-validation-testingpdf.html'
In [28]: fhtmlflo = StringIO(fhtml)
In [29]: newkey = default_storage.bucket.new_key(filepath)
In [30]: newkey.exists()
Out[30]: False
In [33]: fhtmlflo.seek(0)
In [35]: def status_update(transmit, maximum): print "transferred {0} / {1}".format(transmit, maximum)
In [36]: newkey.send_file(fhtmlflo, cb=status_update)
transferred 0 / 0
transferred 0 / 0
transferred 0 / 0
...
btbonval commented 10 years ago

Trying something a bit smaller actually uploads a bit, then fails.

In [37]: smallhtml = """
   ....: <html>
   ....: <body>
   ....: HI FRIENDS!
   ....: </body>
   ....: </html>
   ....: """

In [38]: smallhtmlflo = StringIO(smallhtml)

In [39]: len(smallhtml)
Out[39]: 43
In [40]: newkey.send_file(smallhtmlflo, cb=status_update)
transferred 0 / 0
transferred 43 / 0
---------------------------------------------------------------------------
S3ResponseError: S3ResponseError: 400 Bad Request

Sooo there's this "size" parameter. Maybe that'll make the denominator stop being 0?

In [42]: smallhtmlflo.seek(0)
In [43]: newkey.send_file(smallhtmlflo, cb=status_update, size=43)
transferred 0 / 43
transferred 43 / 43
In [44]: newkey.exists()
Out[44]: True

HOT SAUCE!

btbonval commented 10 years ago

Not sure why my first attempt worked without size: https://github.com/FinalsClub/karmaworld/issues/273#issuecomment-32244223

Deleted file on S3. Trying again with big file, specifying size. Prints two updates and then hangs.

In [45]: newkey = default_storage.bucket.new_key(filepath)
In [46]: newkey.exists()
Out[46]: False
In [47]: fhtmlflo.seek(0)
In [51]: newkey.send_file(fhtmlflo, cb=status_update, size=3365756)
transferred 0 / 3365756
transferred 0 / 3365756

Seems to be a problem with s3boto's send_file. Time to ask the interwebs.

btbonval commented 10 years ago

btw this is where it hangs, writing to SSL:

/usr/lib/python2.7/ssl.pyc in send(self, data, flags)
    196             while True:
    197                 try:
--> 198                     v = self._sslobj.write(data)
btbonval commented 10 years ago

I can totally avoid File Like Objects! http://ferrouswheel.me/2009/12/upload-a-file-to-s3-with-boto/

In [56]: newkey.set_contents_from_string(fhtml, cb=status_update)
transferred 0 / 3365756
transferred 376832 / 3365756
transferred 753664 / 3365756
transferred 1130496 / 3365756
transferred 1507328 / 3365756
transferred 1884160 / 3365756
transferred 2260992 / 3365756
transferred 2637824 / 3365756
transferred 3014656 / 3365756
transferred 3365756 / 3365756
Out[56]: 3365756
In [57]: newkey.exists()
Out[57]: True

confirmed on s3! woop. that was pretty quick to upload.

btbonval commented 10 years ago

Rewrote upload code to use set_contents_from_string. Moved upload code into Note. Replaced copy pasta in gdrive.py and process_s3.py to make use of the upload code in Note. commit 7b61d0712b486ec27c770c84b7e4ae016b6e7591

Running tests again.

btbonval commented 10 years ago

a number of tests errored. It looks like the tests hung, but firefox is actively running at the moment. It's been 5 minutes. :/

btbonval commented 10 years ago
karmaworld.apps.notes.models: ERROR: Error with IndexDen:
  File "/var/www/karmaworld/venv/local/lib/python2.7/site-packages/indextank/client.py", line 131, in create_index
    raise TooManyIndexes(e.msg)
TooManyIndexes: "Too many indexes for this account"

also made a copy/paste mistake.

btbonval commented 10 years ago

A few errors showing up, hanging on the firefox test as before.

This time, however, there are three HTML files on the S3!

The hanging thing bothers me. I'll have to use some verbose to see where that is happenin.

btbonval commented 10 years ago
Test upload of an Evernote note ... ok
Test upload of a file with a bogus mimetype ... ok

No files in S3 after these.

The later upload tests have files in S3 after they run.

btbonval commented 10 years ago

Tests didn't hang using verbose output. How bizarre.

Test that Note.save() doesn't make a slug ... ERROR
Search for a note within IndexDen ... ERROR
Test that the slug field is slugifying unicode Note.names ... ok
ERROR
testCreateCourse (test_selenium.AddCourseTest) ... ok

This test appears moot now that slug is unique and not nullable.

======================================================================
ERROR: Test that Note.save() doesn't make a slug
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/vagrant/karmaworld/karmaworld/apps/notes/tests.py", line 85, in test_save_no_slug
    self.note.save() # re-save the note
  File "/var/www/karmaworld/venv/local/lib/python2.7/site-packages/django/db/backends/postgresql_psycopg2/base.py", line 54, in execute
    return self.cursor.execute(query, args)
IntegrityError: null value in column "slug" violates not-null constraint

I'm guessing this is due to IndexDen not adding any more indices right now.

======================================================================
ERROR: test suite for <class 'karmaworld.apps.notes.tests.TestNoes'>
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/var/www/karmaworld/venv/local/lib/python2.7/site-packages/nose/suite.py", line 227, in run
    self.tearDown()
  File "/var/www/karmaworld/venv/local/lib/python2.7/site-packages/nose/suite.py", line 350, in tearDown
...
  File "/home/vagrant/karmaworld/karmaworld/apps/notes/tests.py", line 58, in tearDownClass
    api.delete_index(secret.INDEX)
  File "/var/www/karmaworld/venv/local/lib/python2.7/site-packages/indextank/client.py", line 38, in delete_index
    self.get_index(index_name).delete_index()
  File "/var/www/karmaworld/venv/local/lib/python2.7/site-packages/indextank/client.py", line 152, in delete_index
    _request('DELETE', self.__index_url)
  File "/var/www/karmaworld/venv/local/lib/python2.7/site-packages/indextank/client.py", line 457, in _request
    raise HttpException(response.status, response.body)
HttpException: HTTP 404: ["No index existed for the given name"]

Three failures from error, no true failures.

Time to check it by hand!

btbonval commented 10 years ago

Removed obsolete null Note.slug test, down to 2 errors caused by IndexDen. Can't get much further than this for now.

btbonval commented 10 years ago

uploaded objects to S3 do not give permission to open/download them.

Need to do what is in this comment: https://github.com/FinalsClub/karmaworld/issues/68#issuecomment-32556950

btbonval commented 10 years ago

Figured out the IndexDen problem. Back to using Beta's IndexDen and all the tests ran just fine.

btbonval commented 10 years ago

These docs are about as helpful as a bag of wet socks. I guess there are uses for a bag of wet socks, but not many. http://boto.readthedocs.org/en/latest/ref/s3.html

Here's what an Everyone Open/Download policy looks like in s3boto:

In [35]: policy.acl.grants[4].permission
Out[35]: u'READ'
In [36]: policy.acl.grants[4].display_name
In [37]: policy.acl.grants[4].type
Out[37]: u'Group'
In [38]: policy.acl.grants[4].uri
Out[38]: u'http://acs.amazonaws.com/groups/global/AllUsers'
In [39]: policy.acl.grants[4].id
In [42]: policy.acl.grants[4].__class__
Out[42]: boto.s3.acl.Grant

So to make that, it'd be something like

from boto.s3.acl import Grant
# once key exists
policy = newkey.get_acl()
policy.acl.add_grant(Grant(permission=u'READ', type=u'GROUP', uri=u'http://acs.amazonaws.com/groups/global/AllUsers'))
btbonval commented 10 years ago

Permission attempt failed. No errors, but the permissions according to S3 do not include Everyone.

Time for guess and check.

btbonval commented 10 years ago

I think the first problem is that changing the policy as noted above does not save that policy remotely. Probably need to call one of the newkey.set_*acl() commands.

In [12]: newkey.set_acl(policy)
S3ResponseError: S3ResponseError: 400 Bad Request
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>MalformedACLError</Code><Message>The XML you provided was not well-formed or did not validate against our published schema</Message><RequestId>3E57DBBC88D03C8E</RequestId><HostId>W1O4/vy8nDyXEhcgawGHyJrCFmGsaYpqwPcE5CwaLVWVXhuSfB/Suhq/6w0YFMSu</HostId></Error>

Here's a problem. Converting the permission into XML ignores the AllUsers URI.

In [23]: all_read.uri
Out[23]: u'http://acs.amazonaws.com/groups/global/AllUsers'
In [24]: all_read.to_xml()
Out[24]: u'<Grant><Grantee xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="GROUP"><EmailAddress>None</EmailAddress></Grantee><Permission>READ</Permission></Grant>'
btbonval commented 10 years ago

type is "GROUP". Looking at Boto source code it is case sensitive 'Group'. https://github.com/boto/boto/blob/develop/boto/s3/acl.py#L155-L156

I'm tempted to write a ticket over there, but it's probably one of those things where the standard for the XML or whatever is case sensitive, therefore the Python must be as well.

btbonval commented 10 years ago

Here's what the grant XML should look like when it's correct vs what is being generated (identical):

In [48]: oldkey.get_xml_acl()
Out[48]: '...<Grant><Grantee xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="Group"><URI>http://acs.amazonaws.com/groups/global/AllUsers</URI></Grantee><Permission>READ</Permission></Grant>...'
In [50]: all_read.to_xml()
Out[50]: u'<Grant><Grantee xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="Group"><URI>http://acs.amazonaws.com/groups/global/AllUsers</URI></Grantee><Permission>READ</Permission></Grant>'

So the problem appears to be with boto's ability to generate either the ACL XML or the Policy XML in a way that satisfies S3.

As an experiment, let's just take the preexisting acl text and write it to the new key.

In [51]: newkey.set_xml_acl(oldkey.get_xml_acl())
In [52]:

Looks good on the S3 management page. I guess I'll just grab that raw XML and put that into the source code. :(

btbonval commented 10 years ago

Fugly fugly fugly but it worked. That XML ACL is huge to be dropping in as a string, but boto is too messed up to do anything else I guess. I see the file on S3 with proper ACLs.

When viewing on the site, the URL asks if I want to download it, rather than showing it in the IFRAME.

Changed over to static S3 properly, and it still pops up a download question. It's an HTML file! Maybe the meta data is wrong?

btbonval commented 10 years ago

Yup. Metadata problem. content-type: application/octet-stream

Gotta make sure these things all get uploaded with content-type as text/html.

That fixes the problem, but it takes forever to download from S3! Also the one I'm looking at looks terrible.

btbonval commented 10 years ago

DIEEEEEE BOTOOOOO!!!! (read as: boto.s3 doesn't do nothin with metadata!?)

In [5]: oldkey = default_storage.bucket.new_key('html/14_motor1pdf.html')
In [6]: oldkey.exists()
Out[6]: True
In [7]: oldkey.metadata
Out[7]: {}
In [8]: oldkey.get_metadata()
---------------------------------------------------------------------------
TypeError: get_metadata() takes exactly 2 arguments (1 given)
In [9]: oldkey.get_metadata('content-type')
In [10]: oldkey.get_metadata('Content-Type')
In [11]: help(oldkey.get_metadata)
Help on method get_metadata in module boto.s3.key:

get_metadata(self, name) method of boto.s3.key.Key instance
In [15]: oldkey.get_metadata(oldkey.name)
In [16]:

btw there is absolutely content-type on every single object, but especially this one when I explicitly set.

btbonval commented 10 years ago

Also tried the above iwht lookup instead of new_key, but I suspect they are exactly the same thing.

btbonval commented 10 years ago

get_metadata is just a wrapper around metadata attribute. https://github.com/boto/boto/blob/develop/boto/s3/key.py#L523-L524

Here's where it gets metadata, during open_read() (not during __init__.py, of course!). not even a memoized fetching dict, just a dict. https://github.com/boto/boto/blob/develop/boto/s3/key.py#L274-L275

I don't have enough middle fingers for this.

In [25]: oldkey.open_read()
In [26]: oldkey.metadata
Out[26]: {}
In [27]: oldkey.metadata.__class__
Out[27]: dict
btbonval commented 10 years ago

So even if I /read/ the metadata, it'd just be a local cached dict that gets updated. https://github.com/boto/boto/blob/develop/boto/s3/key.py#L526-L534 https://github.com/boto/boto/blob/develop/boto/s3/key.py#L536-L537

It doesn't push that stuff anywhere. ever.