TechnionYP5777 / Bugquery

Bug query
9 stars 1 forks source link

SQL DB creation #9

Closed ZivIzhar closed 7 years ago

ZivIzhar commented 7 years ago

StackOverflow is downloadable in a xml format from here: https://archive.org/details/stackexchange In order to use it, we need to reformat it to MYSQL.

tonylekhtman commented 7 years ago

We ramped up on mysql and xml to mysql conversion. We tried converting small xml to mysql. But right now we lack the resources to continue.

tonylekhtman commented 7 years ago

We filtered out the irrelevant fields, but we need to make final decisions about them with the rest of the team.

yossigil commented 7 years ago

No progress for 9 days????? @idabran @zvili

tonylekhtman commented 7 years ago

We had problems processing the data on our computers and we wait @AdiOmari will give us access to some server.

yossigil commented 7 years ago

Delays of this sort should be reflected in issues. In many cases, it is possible to use the team resources, people while they wait. But, if the team does not report it is waiting, the impression is that they are just doing nothing. If all you did was waiting this is NOT GOOD. If you did something else, and did not report it, this is also NOT GOOD, but it can be fixed with later reports.

ZivIzhar commented 7 years ago

Relevant fields from the data were decided.

tonylekhtman commented 7 years ago

The fields we chose to take from the xml are: Id, PostTypeId, ParentId, AcceptedAnswerId, Score, Body, Title,Tags, AnswerCount.

We wrote the SQL code that gets these fields. working on java code that will call our sql command .

yossigil commented 7 years ago

Any related commits?

tonylekhtman commented 7 years ago

Soon, we are having problems with connecting to our local mysql server.

ZivIzhar commented 7 years ago

Regarding the MySQL server on the csl server, In order to not affect adi's databases, we can use another instance as shown in here: http://dev.mysql.com/doc/refman/5.7/en/multiple-windows-command-line-servers.html

tonylekhtman commented 7 years ago

@AdiOmari where should we save the posts.xml? I guess we shouldn't add it to github(50gb), but than the code of converting it to mysql will be broken on computers other than the server.

tonylekhtman commented 7 years ago

Converting the Posts.xml of so to mysql(the big db and not the demo we used so far) to mysql. The process will probably take few hours.

tonylekhtman commented 7 years ago

Successfully imported the xml to mysql server. The access to the server is via localhost:3306

AdiOmari commented 7 years ago

@ZivIzhar @tonylekhtman ".executeQuery("SELECT * FROM so_posts WHERE Id < " + (i + 10000)+" AND Id > "+i);" You should use SQL "LIMIT" command instead: link.

tonylekhtman commented 7 years ago

I tried using it but it returned same rows for different ranges.(you also need to add ORDER BY(Id)) It works in the way I wrote it and I almost finished importing. I can search why it doesn't work this way (with the limit) but the DB is already ready.

AdiOmari commented 7 years ago

@tonylekhtman if it is the import code then fine, just make sure your query function (used by yonatan and roded) uses LIMIT. (Limit works like this limit START_INDEX,NUMBER_OF_RAWS_NEEDED).

tonylekhtman commented 7 years ago

OK