cosmocode / dokuwiki-plugin-struct

A new structured data plugin
https://www.dokuwiki.org/plugin:struct
GNU General Public License v2.0
40 stars 40 forks source link

support import from data #42

Open splitbrain opened 8 years ago

splitbrain commented 8 years ago

Users might want to switch from the data plugin to this plugin. Providing a way to move would be nice.

We could simply import values from the data sqlite but that would leave the data syntax in the pages - some cleanup would be required. But that could be done with search and replace op....

tmo26 commented 8 years ago

Import from data would be highly appreciated, since I have 1130 datasets in the sqlite db.

0bserver commented 7 years ago

During an internship I created this PHP script to be called in a terminal. It imports dataentries and datatables into struct and optionally appends the new struct tables to pages OR replaces the old datatables with struct tables. Feel free to build upon my script. It doesn't contain secrets and I was asked by my boss to upload it here. However I don't consider myself an expert in DokuWiki, data, or struct, and will probably not be able to give much support with questions that didn't come up with our DokuWiki installation. Therefore I didn't include contact information in the script, but I will follow this thread for a while

Concerning liability I chose the MIT license. If another license or some changes are required for legal reasons, I'm ready to make them on request. data_to_struct.zip

tmo26 commented 7 years ago

@0bserver Thanks for this script! I tried it today and ran into problems:

0bserver commented 7 years ago

Yes, every file (except the error file, optimally) should contain something (and did with the installation I worked with). You speak of datatables, does that mean that no dataentries have been found? Without those data_to_struct_keys.txt can't contain anything. In that case my first suspicion is that the regular expression for dataentries might require some tuning, which you can do in the configuration section. E. g. in our case every dataentry had no empty lines and no white-space before the keys.

If you have very large dataentries, maybe you need to increase the chunk size as well (although on our system going far beyond 600 resulted in silent errors, which means that no matches were found any more, but no errors reported either; there is also a PHP setting for regular expression string length, though changing that isn't recommended officially).

If all that doesn't help I'm afraid you need to do some debugging of the script, since I can only guess.

tmo26 commented 7 years ago

Thanks for your explanations! It seems to me that the regex for finding the dataentries is working correctly. Please see http://www.phpliveregex.com/p/ijn to see an example dataentry. preg_match_all finds something, but I can not judge if this is correct as expected by the script.

no whitespace before the keys: As you can see, I have whitespace between < key> : < value>. Could this be an issue?

large dataentries: As you can see, my dataentries are quite large. Could this be an issue?

By looking at my dataentry: Can you spot anything else in the dataentry that might result in no findings?

0bserver commented 7 years ago

Whitespace between keys and values are accounted for, that shouldn't be a problem. The size of the entries is almost certainly the reason. Your example entry has about 4500 characters, but in my tests I ran into problems if the complete text so search had much more than 1000 characters, which is why I split the complete text into chunks and then go through all chunks, always testing three consecutive ones which together shouldn't have much more than 1000-1500 characters.

You can try setting the chunk size to a value like 2000, which means that the RegEx will be applied to 6000 character strings, but that will almost certainly produce no findings if you don't change your PHP configuration for PCRE and then restart Apache (see http://www.php.net/manual/en/pcre.configuration.php). Because the PCRE library can in some cases use huge numbers of recursions setting these values too high can crash PHP (which is exactly why I developed my chunk solution, but that alone won't suffice in your case).