Closed mmosc closed 1 year ago
The header of the .inter
was duplicated, I fixed that in the new commits, and also added the code for the .item
conversion. Since there are several files for item features, the code is designed to convert one of them, depending on the filename. I decided so because users might want to download only one feature file, and convert one of them only, instead of all.
Should I create a new pull request?
Meanwhile I will work on the README.
Cheers, Marta
Hi!
I modified the code to include a class for the dataset Music4All-Onion. So far, only the .inter file gets converted. One remark: for the file with the timestamp, I selected
token
as format, since the timestamp is given as date and time, for instance2013-01-27 21:42:38
. Maybe there is a better way?There are a couple of ToDo's:
- Include code for
.item
and.user
files- Include a README with instructions on how to download the dataset and convert it
- Upload the converted atomic files to your collection of files
I will work on the
.item
part in the next days.Thank you for this great library! Cheers Marta
Hi!
Thanks for the great contribution! The conversion script looks fine. The only concern is about the type of several columns.
count:token
. It seems that this column denotes how many times a user listen to the track. Maybe it could be better to be a float
type if the feature is numeric and can be compared. The type token
is for some discrete features that are more suitable for lookup embeddings.timestamp:token
. Could the string be converted into UNIX timestamp in the provided scripts for the convenience of comparing and sorting? For example, we can use time.strftime
Python APIs. Then we this column could be better in a float
type.Looking forward to include Onion :)
Cheers, Yupeng
The header of the
.inter
was duplicated, I fixed that in the new commits, and also added the code for the.item
conversion. Since there are several files for item features, the code is designed to convert one of them, depending on the filename. I decided so because users might want to download only one feature file, and convert one of them only, instead of all.Should I create a new pull request?
Meanwhile I will work on the README.
Cheers, Marta
You can directly append the commits in this PR. :) Thanks!!
Thanks for your feedback :)
I appended the new commits:
Have a look at let me know!
Look good to me! Thanks so much.
By the way, may I download the processed files somewhere? I can upload them to our storage hubs, e.g., Google Drive. Then I'll merge this PR and update our websites etc.
If not, I can try to convert the original datasets into atomic files, and we can then check the md5 token.
Thank you @hyp1231 !
The atomic files are not yet ready to download anywhere, since I did not process them all. You can try and convert the original dataset, as you were mentioning,
Hi!
I modified the code to include a class for the dataset Music4All-Onion. So far, only the .inter file gets converted. One remark: for the file with the timestamp, I selected
token
as format, since the timestamp is given as date and time, for instance2013-01-27 21:42:38
. Maybe there is a better way?There are a couple of ToDo's:
.item
and.user
filesUpload the converted atomic files to your collection of files
I will work on the
.item
part in the next days.Thank you for this great library! Cheers Marta