Erudika / scoold

A Stack Overflow clone for teams (self-hosted or hosted)
https://scoold.com
Apache License 2.0
861 stars 239 forks source link

Stackoverflow Import #406

Closed SHogenboom closed 1 year ago

SHogenboom commented 1 year ago

I’m still having issues getting the Stackoverflow Teams Import to work. Specifically my issues are:

  1. The viewCount is not imported correctly. The viewCount is a number but after import shows 0. This could be due to an issue with mapping the parameters. Is it possible to find somewhere which Stackoverflow parameters become which Scoold parameters?

  2. Cannot add the questions to a space. I created a space called pb8x14-literatuurstudie and added the following parameter to the json scheme: "space" = "scooldspace:pb8x14-literatuurstudie”. Unfortunately, the questions remains assigned to the default space. Any suggestions on what I’m doing wrong here?

  3. How do I import users with email / password settings? Currently the Stackoverflow Teams export does not include these details but I have them from another database. I want to import user accounts so the posts are assigned to the correct users. However, without importing login credentials theses users can never login back to their old accounts?!

  4. Which userTypes can we use upon import? Would be appreciated if we could assign users to moderator/admin/normal upon import.

  5. Can we add badges to the user import? If so, what should the json-schema look like? I would try the same as with the spaces, but as that doesn’t work I’m not sure how to achieve it. Having said that, what would a general import file look like? There is no documentation on how the json imports work. A pointer to the code would also help me to see what happens ;)

albogdano commented 1 year ago

Thanks for reporting these issues! I was able to fix them.

  1. Fixed - this field was previously ignored.
  2. Fixed - now you can add the space field but the format must be "scooldspace:pb8x14-literatuurstudie:PB8x14 Literatuur Studie" the last part after the semicolon is the display name of the space. Also you must add the same space field for answers as well and it must match that of the parent question.
  3. I added support for an optional passwordHash field now but it must be hashed using the BCrypt algorithm. Also users should be able to reset their passwords by asking Scoold to send them a reset password link.
  4. You can now also import moderators, whereas before only admins and regular users were supported (modify the users.json file to include "userTypeId": "Admin OR Mod OR Registered"
  5. Partially fixed - badges cannot be imported easily because Scoold uses different types of badges and SO badges cannot be converted directly to Scoold badges. You can now add the spaces field to each user object in users.json. This field must be an array of strings, each in the same space format as described above.

The code for importing from SO can be found inside the AdminController.java here

SHogenboom commented 1 year ago

When will the changes be added to a new release? I use the .jar file to run the instance locally for testing. Not sure how to get a .jar file from a github clone.

albogdano commented 1 year ago

I will probably release a new version next week. You can always run mvn clean package which will build the JAR package in the ./target directory of your cloned repo. You will need to install Apache Maven first and the Java SDK.

albogdano commented 1 year ago

Scoold 1.56.0 has been released with the changes above.

SHogenboom commented 1 year ago

@albogdano is it possible that the changes introduced a different error? The import keeps ‘pending’ - even for very small imports. This was not the case in previous versions.

albogdano commented 1 year ago

I will check that. Do you have the checkbox "Delete all before import" checked?

albogdano commented 1 year ago

Any errors in the Scoold logs? It seems that I cannot reproduce the issue and the issue occurs with specific input data. Did you modify the JSON data before importing it? If any of the modified values use a data type different from what is expected by Scoold, it will throw an exception and the whole import job will not continue.

SHogenboom commented 1 year ago

Back again, sorry for the delays in between but the issue keeps persisting… I am modifying the JSON, but only in ways which worked before the latest changes to the SO import code. I’m now getting the error:

2023-10-11 12:11:33,094 [ERROR] c.e.s.controllers.AdminController - Failed to import questions.zip
java.lang.NullPointerException: Cannot invoke "String.split(String)" because "t" is null
    at com.erudika.scoold.controllers.AdminController.importPostsFromSO(AdminController.java:583)
    at com.erudika.scoold.controllers.AdminController.importFromSOArchive(AdminController.java:544)
    at com.erudika.scoold.controllers.AdminController.lambda$restore$12(AdminController.java:430)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
    at java.base/java.lang.Thread.run(Thread.java:1623)

Hope this helps and you’ll be able to fix it. Even the original SO import - no changes to the JSON - are not working anymore! I realise I’m one of the few people using the option, but as you advertise it I would expect it to work… So sorry for keep finding new issues.

albogdano commented 1 year ago

Thanks again, Sally! The issue above should now be fixed in the main branch. A new release is coming soon.

SHogenboom commented 12 months ago

@albogdano ; Users are causing a problem (hopefully the last):

2023-11-09 18:42:28,724 [ERROR] c.e.s.controllers.AdminController - Failed to import users.zip
java.lang.ClassCastException: class java.util.LinkedHashMap cannot be cast to class java.lang.String (java.util.LinkedHashMap and java.lang.String are in module java.base of loader 'bootstrap')
    at com.erudika.scoold.controllers.AdminController.importUsersFromSO(AdminController.java:677)
    at com.erudika.scoold.controllers.AdminController.importFromSOArchive(AdminController.java:553)
    at com.erudika.scoold.controllers.AdminController.lambda$restore$12(AdminController.java:430)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
    at java.base/java.lang.Thread.run(Thread.java:1623)

With regards to user import I two additional questions:

albogdano commented 12 months ago

The exception happens because Scoold expects these two properties to be strings: creationDate and userTypeId. Please check that in your users.json file those properties are indeed strings.

The users.json file contains the account data and accounts.json contains emails for each user id. Scoold has so-called "profiles" which are objects that hold the account data. But it also stores basic user data like email and names in Para as user objects. So, with regards to the import, Scoold will create a profile and a user object from the data in users.json and accounts.json

You can import users from SO and later if they want to log in with SSO they have to use the same email address for signing in. If the SSO email differs from the one imported from SO, those users will have two separate accounts. After SSO authentication succeeds, Para will try to find the user by email and then link their SSO ID to the Para user object in the as identifier with an id like oa2:{sso_id}. You shouldn't manually link SSO data to user objects as that is going to be very error-prone. You can just configure SSO in Scoold depending on what type of SSO you want to use. After configuring it, a button will appear on the /signin page. Hope that helps.

SHogenboom commented 11 months ago

@albogdano ; fixed the issues you mentioned - thanks for pointing them out! However, the error keeps persisting:

2023-11-13 15:11:26,342 [ERROR] c.e.s.controllers.AdminController - Failed to import users.zip
java.lang.ClassCastException: class java.util.LinkedHashMap cannot be cast to class java.lang.String (java.util.LinkedHashMap and java.lang.String are in module java.base of loader 'bootstrap')
    at com.erudika.scoold.controllers.AdminController.importUsersFromSO(AdminController.java:677)
    at com.erudika.scoold.controllers.AdminController.importFromSOArchive(AdminController.java:553)
    at com.erudika.scoold.controllers.AdminController.lambda$restore$12(AdminController.java:430)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
    at java.base/java.lang.Thread.run(Thread.java:1623)

The json that fails:

{
    "lastAccessDate": "2023-06-13T11:22:40Z",
    "creationDate": "2012-09-11T16:00:03Z",
    "answerCount": 1864,
    "questionCount": 70,
    "goldBadges": 0,
    "silverBadges": 0,
    "bronzeBadges": 0,
    "views": 0,
    "reputation": 0,
    "id": 2,
    "userTypeId": "Registered",
    "accountId": 2,
    "lastLoginDate": "2023-06-13T11:22:40Z",
    "profileImageUrl": "https://www.gravatar.com/avatar/83135b1112faeb77a63b79be2a6ca699?s=128&d=identicon&r=PG&f=y&so-version=2",
    "realName": "gjp",
    "spaces": ["scooldspace:analyseren:Analyseren", "scooldspace:studiematerialen:Studiematerialen", "scooldspace:digitale-leeromgeving:Digitale Leeromgeving", "scooldspace:methodologie:Methodologie", "scooldspace:experimenteel-onderzoek-oeo-pb04x2:Experimenteel Onderzoek (OEO, PB04x2)", "scooldspace:literatuurstudie-ls-pb07x2:Literatuurstudie (LS, PB07x2)", "scooldspace:inleiding-onderzoek-oio-pb02x2:Inleiding Onderzoek (OIO, PB02x2)", "scooldspace:cross-sectioneel-onderzoek-oco-pb08x2:Cross-sectioneel Onderzoek (OCO, PB08x2)", "scooldspace:artikel-schrijven:Artikel Schrijven", "scooldspace:kwalitatief-onderzoek-oko-pb16x2:Kwalitatief Onderzoek (OKO, PB16x2)", "scooldspace:longitudinaal-onderzoek-pb17x2:Longitudinaal Onderzoek (PB17x2)"]
  }

Thanks!

albogdano commented 11 months ago

Hm, I tried importing the JSON above but it did not raise the exception. Perhaps there's another JSON object in the users.json file that has a creationDate in the form of an Array? The JSON above seems fine and all fields are in the correct format.

SHogenboom commented 11 months ago

Cheers! It appears to be caused by one of the original .json files from SO which is an empty array. If I only use the newly created files it works as expected. Thanks for all your help!