alex9311 / information-retrieval

TU Delft, Masters Software Technology, Information Retrieval, 3rd Quarter 2015
1 stars 2 forks source link

Database schema #35

Closed alex9311 closed 9 years ago

alex9311 commented 9 years ago

hey all,

I made the database schema, I think its actually very simple.

If a user sees an idea, an entry is created in the seen table. If the user upvoted the idea, the upvote field is set to 1 in that entry in the seen table.

Does this look like it covers what we'll need?

sparked_db_schema

millenniumproof commented 9 years ago

It looks a bit too simple. I think we should already take the crowdflower and dandelion steps into account. We can treat them like blackboxes, defining just the input and output. ((Idea)id+text+image) -> CrowdFlower -> (id_Idea+Rejected/Accepted+Reason) So the Idea table should have a field for 'Rejected/Accepted' and 'Reason'. We should have a table 'Submission' linking the id_User to id_Idea that he submitted. I guess the output table from dandelion would be something like 'id_Idea' 'id_Idea' 'Semantic similarity' 'Syntactic similarity'.

We also thought it would be nice if the user could see how many votes an idea got each day. So the user could see in a graph how well his idea is doing over time and when he needs to promote his idea more. Or maybe we keep that as a wannahave/future work.

alex9311 commented 9 years ago

Thanks for the feedback! I knew I was missing things!

How about this? I added a date field to the seen table so we could count number of votes in a day if we wanted

screen shot 2015-03-10 at 8 10 58 pm

millenniumproof commented 9 years ago

We need an a separate table for the screening results of the Sanitation Check on CrowdFlower and the similarity check using Lucene.

millenniumproof commented 9 years ago

There is no data field for the 'reason of rejection' in the screening results database table.

alex9311 commented 9 years ago

I'm about to go into a meeting but I'll add the 'reason' field so the database reflects the schema above.

You're saying you want two Screening_results tables? One for Lucene and one for CrowdFlower with the same fields?

I'll be reseting the DB tonight to implement these changes, so dont submit ideas you want to save

millenniumproof commented 9 years ago

Yeah, two Screening_results tables would be good. The Lucene one will be different, I think, maybe a extra confidence value or something, but for now just an accepted (yes/no) will do.

alex9311 commented 9 years ago

Okay I reseted the database and made the changes mentioned above https://github.com/alex9311/TUD-Information-Retrieval-Group-02/commit/45162985d4cb6a3f4cc292eef009d62cc17739cc

The accepted field is still an integer though (0 or 1) is that ok?

millenniumproof commented 9 years ago

Hmm, well the professor did say that on a human computation job multiple people should perform each task. Meaning we get a 'Yes', 'No' or a 'Not Sure'. The ideas that get a 'Not Sure' result would then be checked by a site admin or something. Maybe make it 1 = 'Yes', 0 = 'No', -1 = 'Not Sure'.

alex9311 commented 9 years ago

That sounds good to me! Right now its INT(2) for that field, which i think means two bits, so a max of 4? I'm not positive how it works