Use the unique message ID as your key

elcolumbio / mlrepricer

Explore pricing data. Share insights and models. Build environment for repricer.

Other

9 stars 3 forks source link

Use the unique message ID as your key #3

Closed Bobspadger closed 6 years ago

Bobspadger commented 6 years ago

In the jupyter notebook at 17 you have

# that's a helpful pseudo multiindex, it's superior since we save a colomn and it's speaking
df['messageid'] = df['asin']+'_'+df['time_changed'].dt.strftime('%Y-%m-%d %H:%M:%S.%f')

Amazon provide both a message ID (from the SQS queue) but also a unqiue id for each update:

result = mws.mws.DictWrapper(message.body)
            # print(message.body)
            unique_id = result.parsed.NotificationMetaData.UniqueId

It may be easiest, and more reliable to use this for your index, rather than generating a new one from the asin and time changed etc?

elcolumbio commented 6 years ago

That's reasonable, what i thought:

if we delete the messages the key has no meaning the new key speaks to you and makes you understand the data. Like a snapshot is a groupby of asins and most recent timestamp. For example the slack api extensively uses timestamps too. It's really nice for events on a timeline.

We may use that aws message key for testing or if we screw something up at the point we call from the queue. In the jupyter notebook i would not like to have this aws message key since it's dead.

elcolumbio commented 6 years ago

This is not wrong. This is more a concept of a helper column or should be a groupby, it's also not unique. Actually this is only one view or groupby of many. Sure it's a conceptual important one. But it's also one we never use alone in the end. I did not fully understand how many useful views we can generate. Will hopefully come up with a jupyter notebook.

For now i don't see an usecase, maybe later. We should talk a lot about meaningful groupbys and table views.