Open JackuB opened 8 years ago
I am just thinking if it is necessary to have it here? Basically it's optimisation that can be applied in place where detection is done.
Anyway it is good carch that running rexexp against huge codes is unnecessary, because in most cases we are able to detect type from couple of chars from beginning.
So lets do it. :+1:
:+1:
However, it has scenarios with wrong results:
"swagger":
key, but somewhere at the bottom as order of properties is undefined in JSON or YAML and shouldn't matter.You may apply another optimization: Looking for swagger
word in the input string using just .indexOf
, which, I believe, is the fastest way to do simple lookup (I have no prove for this). If there is a match, you may proceed to the heavyweight RegExp matching. Also, .indexOf
will give you position, so you have information about how much of the input you may afford to cut off.
@honzajavorek To you concern about wrong identification. I think we shouldn't do this https://github.com/apiaryio/deckardcain/blob/master/src/deckardcain.js#L60 at all.
And to yours optimization: Do you know it will have significant impact? Also this approach can be very simply tricked:
FORMAT: 1A
# swagger
My final proposal is to add simple optimizations for each type e.g.:
API Blueprint As we know format must be at the beginning.
if (source.substring(0, SAMPLE_LENGTH).match(API_BLUEPRINT_HEADER)) {
// ...
}
For swagger in yaml we can use sample as well, because we know that most of the swagger documents have it specified at the beginning. So in most cases it will speed up.
With this way we avoid running all tests twice and also will have targeted optimization for each type.
@JackuB Did you run into problem with speed or is it just pre optimization?
@freaz yep, only on really big blueprints (+10k LOC). Typing can get laggy, but I didn't test how much is it affected by ACE itself etc.
It's not big issue right now - works fine for most blueprints. Anyway, I have idea for PR, will do it once I'll have some time.
To you concern about wrong identification. I think we shouldn't do this https://github.com/apiaryio/deckardcain/blob/master/src/deckardcain.js#L60 at all.
This https://github.com/apiaryio/deckardcain/blob/master/src/deckardcain.js#L55 carries the same vulnerability and it's the core of Apiary Blueprint identification, so my concern is valid even if we do not check for + Response
.
Do you know it will have significant impact?
I did no measurements. My intention was to offload "expensive" regexp matching only to input indicating the regexp has some probability to match anything, because this way we can optimize and at the same time we do not put us into the danger of wrong identification. Shortening the input based on wishful thinking that the swagger
keyword is usually at the top so it will be always at the top is just wrong, IMHO.
Also this approach can be very simply tricked
My approach would take this input as:
.indexOf
with swagger
. Result: It is present, thus it's worth to try RegExp. The .indexOf
provides position, so we can possibly guess how much we can cut off without losing the interesting part.My final proposal is to add simple optimizations for each type
I agree, this is reasonable. I do not agree we can use simple fixed-length sample for either Swagger JSON or YAML.
My algorithm breaks into pieces, because when it comes to whether indexOf
is faster and simpler than RegExp, the answer is: not necessarily... :neutral_face:
http://jsperf.com/deckard-indexof-vs-regexp
But still, I think searching for just swagger
at first (either by indexOf
or by match
) could be still faster than checking the full RegExp, but the question is whether it's worth it.
I did some hard science: http://jsperf.com/deckardcain-string-shortening/3
I think we should change flow to this:
What do you think? @hj @freaz