Closed developius closed 9 years ago
They are indeed, @OliCallaghan, I'm taking this considering I wrote the code and understand the problem it was made to solve.
data = cleanArray($('.emphasized-link').text().replace(/ +?|\r/g, '').split("\n"));
is the bit that breaks stuff.
Thanks @popey456963 - when this is fixed I think we're ready for a Beta release!
@popey456963 status?
Okay, so, what is happening is that I get an ugly array:
[ blarg \r\r\r\r \r\r\r bash]
And, before, what I was doing was simply ignoring those to get:
[blarg, bash]
What I didn't realise was happening though, was that it also removes spaces
Business Studies --> BusinessStudies
To fix this I've had to remove the line above, and instead used:
data = cleanArray($('.emphasized-link').text().replace(\r/g, '').split("\n"));
(Note the missing / +?|
). However, this returns six undefined
s for every actual word. To fix this I have two options, doing it client-side or doing it server-side. The client-side method I've already shown you, and consists of looping through each instance, testing whether it's undefined, and if it is, remove it.
The server-side method is to detect whether each array instance contains four digits (using RegeX most likely) and then dd those to a new array (and only those) which is then passed to the callback.
I have started working on the client-side, but promptly gave up when informed that it's efficiency was so bad (it also means the client receives 6x the data). The server-side solution is coming on nicely, with the nicely written RegeX being short (/d{4}). Currently I'm having some problems making my solution efficient, looking for alternatives to RegeX (which is using a considerable amount of time on each pass, and requires 6*categories
for each exam board.
In that sense, I am looking for suggestions of better methods, other than RegeX for sorting an array as described above. Any ideas?
@popey456963 I can't think of another way to remove those things from the array... :(
@popey456963 status?
Oh yeah, I completely forgot about this. I have code that stops the removal of spaces and instead removes arrays based on whether or not it contains four numbers. This works great, except randomly (1/10, different every time?!) we get an entry that looks like:
\t\n\t\t\t\t\t\nSubject\t\t\t\t9999\t\t\t\t\n
Which doesn't print out in a drop down box very well. I'm currently writing code that removes all \n's and only remove \t's when there is more than one however I am running into some troubles. I don't really want to use RegeX as that is hideously slow, so I'm using something that should hopefully be O(n).
@popey456963 regex is going to be quick enough I think so if it's easier just use that.
If you're set on not using regex then I suggest you replace 4 \t
's (as long as it's always 4 \t
's between the subject name and the syllabus number) with a space and then trim all spaces from the start and all spaces from the end. That should give you what you want.
This regex [a-zA-Z](\t{4})\d
almost does what you want except it's (in Python
anyway) including the last character of the name and the first character of the number which is strange because it's using a capture group to avoid exactly that issue...
@popey456963 how far did you get with this regarding my previous comment?
I think spaces are being removed!