Closed GoogleCodeExporter closed 8 years ago
Note that a 32 cycle setup requires specific hub timing which isn't always
guaranteed provided a waitpxx trigger is used. The no-trigger version however
could benefit from that.
Original comment by marko.lu...@kyi.biglobe.ne.jp
on 14 Sep 2011 at 5:03
This limitation is mainly due to the fact that there are only 6 cycles left in
a 32 cycle loop, i.e. it works 6 out of 16. Using 48 as a minimum avoids that
trap.
Original comment by marko.lu...@kyi.biglobe.ne.jp
on 14 Sep 2011 at 5:20
Thanks for this analysis, Marko.
We'll work on this when Sal gets back form his travels.
Should be some time this month.
Original comment by prof.bra...@gmail.com
on 14 Sep 2011 at 8:22
I am not sure I understand what you are saying, so I will
describe the code and the requirements in hopes that we can
clarify.
The primary requirements for this routine:
1. Sample every n cycles exactly
2. Provide a trigger mechanism for sampling
3. Sample as quickly as possible after the trigger
4. Have the dynamic range of n be as large as possible
So I have annotated the routine with timings, and by my
estimation the fastest I can meet these requirements is
39 cycles, hence the lower limit of 40.
If I understand the spec, the longest wrlong can ever take
is 22 cycles.
So as long as the worst case scenario is accounted for, the code
loop should meet the above requirements.
Now if the requirement is only how fast we can get the loop to run:
The loop time would be 32 clocks. But unfortunately this does not meet
the above requirements, but it may be useful elsewhere.
TIMINGS of orginal code:
:asm
\ the mask of input bits we care about
v_LTm
0
\ the value of bits before we trigger
v_LTb
0
\ the value of bits we trigger on
v_LTa
0
\ a _sample ( -- )
\
a_sample
\ wait for trigger values
waitpeq v_LTb , v_LTm
waitpeq v_LTa , v_LTm
\ get the sample and set up the count for the next sample
\ Minimum Time Maximum Time Minimum
Time Maximum Time
\
mov _treg1 , ina
\ SAMPLE-> 4 4 4 4
mov _treg2 , cnt
\ 4 8 4 8
add _treg2 , __Dv_LScnt
\ 4 12 4 12
__1
\ write out the sample
wrlong _treg1 , __Fv_LbufP
\ 7 19 22 34
\ 7 19
22 34
\ wait for the next sample time
waitcnt _treg2 , __Dv_LScnt
\ 5 24 5 39 +
\ 5 24
22 39 +
mov _treg1 , ina
\ SAMPLE -> 4 4
4 4
add __Fv_LbufP , # 4
\ 4 8
4 8
djnz __Ev_LSz , # __1
\ 4 12
4 12
\ we are done
jnext
\ a pointer to the sample buffer
v_LbufP
__F
0
\ the sample buffer size
__E
v_LSz
0
\ the number of clock between samples (decimal 40 min)
__D
v_LScnt
0
;asm
Original comment by salsa...@gmail.com
on 2 Oct 2011 at 10:18
[deleted comment]
[deleted comment]
OK, first of all, apologies! Unfortunately I don't have the original test code
anymore but my best bet is that the initial waitcnt setup is crucial and it
probably wasn't in this context. I know for a fact that I had lockups for
everything not 16n.
Regardless, 40 cycles isn't the actual safe minimum (it locks up 2 out of 16
times).
I reviewed the timing and all we have to do is raise the minimum to 41 which is
the 39 from your calculations +1 (hubops are 8..23) and +1 (waitcnt is 6+).
Original comment by marko.lu...@kyi.biglobe.ne.jp
on 3 Oct 2011 at 2:08
I have checked the specs and the data sheet, wrlong is spec'd at 7-22 and
waitcnt is spec'd at 5+, in version 1.1, when I wrote the code.
I have never had release versions of LogicAnalyzer lock up, and it is a tool I
use a lot.
But the chips/boards I am using for test are 3+ years old. I note more recent
docs Manual 1.2 and spec 1.4 have changed timings. Was the change an errata or
an update to reflect more recent steppings?
I agree the minimum timings have to changes. This easy enough to change. But I
want to do some reasonable analysis testing to ensure it is reliable.
Will update in 5.0.
Do you agree with the updated timing analysis?
BTW thank you for pointing this out.
So the updated timing analysis should be:
:asm
\ the mask of input bits we care about
v_LTm
0
\ the value of bits before we trigger
v_LTb
0
\ the value of bits we trigger on
v_LTa
0
\ a _sample ( -- )
\
a_sample
\ wait for trigger values
waitpeq v_LTb , v_LTm
waitpeq v_LTa , v_LTm
\ get the sample and set up the count for the next sample
\ Minimum Time Maximum Time Minimum
Time Maximum Time
\
mov _treg1 , ina
\ SAMPLE-> 4 4 4 4
mov _treg2 , cnt
\ 4 8 4 8
add _treg2 , __Dv_LScnt
\ 4 12 4 12
__1
\ write out the sample
wrlong _treg1 , __Fv_LbufP
\ 8 20 23 35
\ 8 20
23 35
\ wait for the next sample time
waitcnt _treg2 , __Dv_LScnt
\ 6 26 6 41 +
\ 6 26
23 41 +
mov _treg1 , ina
\ SAMPLE -> 4 4
4 4
add __Fv_LbufP , # 4
\ 4 8
4 8
djnz __Ev_LSz , # __1
\ 4 12
4 12
\ we are done
jnext
\ a pointer to the sample buffer
v_LbufP
__F
0
\ the sample buffer size
__E
v_LSz
0
\ the number of clock between samples (decimal 40 min)
__D
v_LScnt
0
;asm
Original comment by salsa...@gmail.com
on 3 Oct 2011 at 3:55
> I have checked the specs and the data sheet, wrlong is spec'd at 7-22 and
waitcnt > is spec'd at 5+, in version 1.1, when I wrote the code.
Those timings were never correct. After more than 2 years of chewing Parallax's
ears I managed to convince them to update the documentation.
> I have never had release versions of LogicAnalyzer lock up, and it is a tool
I use > a lot.
It's tricky but easy enough to reproduce.
mov _treg1 , ina
mov _treg2 , cnt ' -8 (%%)
add _treg2 , __Dv_LScnt ' -4 (%%)
wrlong _treg1 , __Fv_LbufP ' +0 = perfect alignment (%%)
waitcnt _treg2 , __Dv_LScnt ' +8 %% consumes __Dv_LScnt + 5 (%%)
mov _treg1 , ina ' +37
add __Fv_LbufP , # 4 ' +41
djnz __Ev_LSz , # __1 ' +45
wrlong _treg1 , __Fv_LbufP ' +64 = 15 idle cycles (waiting for window)
waitcnt _treg2 , __Dv_LScnt ' +72
The first waitcnt sees the match at 34 (37-3) the second would at 75 (72+3).
Unfortunately that's one cycle late. As I said it's 2 in 16 so it may well work
for you without noticing that there is a glitch.
> Do you agree with the updated timing analysis?
Looks OK (formatting is a bit odd). I would have done it differently, the loop
- not counting exit - is basically
waitcnt
12 cycles
hubop
which amounts to worst case 6 + 12 + 23 = 41 so that the next waitcnt doesn't
stall. The entry condition is
sample cnt
add delay
hubop
waitcnt
which requires 8 + 23 + 6 = 37 to be safe. Which leaves us with 41. I have a
test program here which'll show you above error case.
> BTW thank you for pointing this out.
No problem. I still spend this Monday being embarrassed :)
Original comment by marko.lu...@kyi.biglobe.ne.jp
on 3 Oct 2011 at 4:31
Attachments:
FWIW, fastest single cog ina-to-hub transfer these days is 16 cycles (as much
as there is space in hub). Makes trigger based sampling a bit tricky though
since you're tied to the hub window. HTH
Original comment by marko.lu...@kyi.biglobe.ne.jp
on 3 Oct 2011 at 4:44
> ... (formatting is a bit odd).
I think I get it now. Left is the entry condition, right side is the loop
behaviour. In that case the 41+ left is a bit misleading since the initial 4
cycles for sampling ina are not relevant re: timing constraints. That said, I
don't know what your trying to express here. If it's just the time line then I
agree. OTOH, if you're after the constraints then it probably should be made
more clear, i.e. the 4 instructions starting from mov _treg2, cnt up to and
including waitcnt consume __Dv_LScnt + 5 cycles. 14 (min) of which are consumed
by setup and waitcnt (4+4+6) which leaves the remainder for the hubop, e.g.
delay = 41, total runtime 46 cycles, hubop has a huge 32 cycle window (more
than enough).
Original comment by marko.lu...@kyi.biglobe.ne.jp
on 3 Oct 2011 at 5:06
I am currently working on porting everything to 5.0, and LogicAnalyzer is on my
list in the next few weeks. I am going to add a regression test.
Generate an nn cycle square wave on pin xx
Run the different sample routines
Do an in memory check of the square wave to make sure there are no phasing
error (or any other) errors.
By running this test for each sample routine, should be able to verify the
sample routines are working properly, on all sample rates. By sampling at the
signal frequency (which is really undersampling) or any integer multiple of the
frequency, a phase error should show up if there are any problems.
I did this manually in the early versions, but it never made it into the test
cycle, and it did not make the 4.6 regressions.
Will work with Doug to get an updated LogicAnalyzer out in the shorter term.
Original comment by salsa...@gmail.com
on 3 Oct 2011 at 1:41
Some empirical results:
Set up a test to sample continuously into main memory with the above routine
3000 samples.
At a sample interval of 39 cycles it fails immediately.
At a sample interval of 40 cycles it eventually fails ( I heated the propeller
with a hair dryer and it would start to fail usually within 5 minutes - did not
have an accurate way to measure temp, but it was hot).
At a sample interval of 42 cycles I have not yet seen a failure.
Original comment by salsa...@gmail.com
on 5 Oct 2011 at 2:19
How did you trigger the sample start?
Original comment by marko.lu...@kyi.biglobe.ne.jp
on 5 Oct 2011 at 2:45
I wrote specialized code in 5.0, generated a signal, and sample it.
Will make sure it makes it into the 5.0 regression.
la in 5.0 now has an added routine, which can sample from 18 clock up. If you
would do a deep dissection on these, I would be very happy to get your feedback.
\ this variable which is used to synch the 4 cogs which are doing interleaved
sampling
variable _la_s1time
wvariable _la_s1addr
\ this variable value is the first of 4 sequential cogs used to sample every
clock cycle, default to this cog 2
wvariable _la_s1cog 2 _la_s1cog W!
wvariable _la_s1size
\ _la_is1 ( -- ) this is the interleaved routine used by 4 cogs to sample very
clock cycle
: _la_is1
_la_s1addr W@ cogid _la_s1cog W@ - 2* 2* +
_la_s1time L@ cogid _la_s1cog W@ - +
_la_asample1
2* 2* _la_s1size W!
;
\ _la_sample1 ( baseaddr numsamples samplecycle triggerbefore triggerafter
triggermask -- numsamples)
: _la_sample1
3drop 2drop
_la_s1addr W!
_la_s1cog W@ dup cogreset
1+ dup cogreset
1+ dup cogreset
1+ cogreset
zeroFreeDict
x10 delms
clkfreq cnt COG@ + _la_s1time L!
c" _la_is1"
dup _la_s1cog W@ cogx
dup _la_s1cog W@ 1+ cogx
dup _la_s1cog W@ 2+ cogx
_la_s1cog W@ x3 + cogx
_la_s1time L@ x8000 + 0 waitcnt drop
_la_s1size W@
;
\ _la_sample ( baseaddr numsamples samplecycle triggerbefore triggerafter
triggermask -- numSamples )
: _la_sample
x4 ST@ x310 >=
if
x3 ST@ x29 >=
if
_la_asample41+
else
x3 ST@ x12 >=
if
_la_asample18+
else
x3 ST@ x4 =
if
_la_asample4
else
x3 ST@ 1 =
if
_la_sample1
else
3drop 3drop 0
then
then
then
then
else
3drop 3drop 0
then
;
\ _la_asample41+ ( baseaddr numsamples samplecycle triggerbefore triggerafter
triggermask -- numsamples)
build_BootOpt :rasm
\ trigger mask
mov $C_treg6 , $C_stTOS
spop
\ trigger after
mov $C_treg5 , $C_stTOS
spop
\ trigger before
mov $C_treg4 , $C_stTOS
spop
\ sample cycle
mov $C_treg3 , $C_stTOS
spop
\ num samples
mov $C_treg2 , $C_stTOS
spop
\ base address - $C_treg1
mov $C_treg1 , $C_stTOS
mov $C_stTOS , $C_treg2
\
\ $C_treg1 - baseaddr
\ $C_treg2 - numsamples
\ $C_treg3 - samplecycle
\ $C_treg4 - triggerbefore
\ $C_treg5 - triggerafter
\ $C_treg6 - triggermask
\
\
\ wait for trigger
\
waitpeq $C_treg4 , $C_treg6
waitpeq $C_treg5 , $C_treg6
\
\ get the sample and set up the count for the next sample
\
\
\ t = 0
mov $C_treg6 , ina
\ t = 4
mov $C_treg5 , cnt
\ t = 8
add $C_treg5 , $C_treg3
\
\ $C_treg1 - baseaddr
\ $C_treg2 - numsamples
\ $C_treg3 - samplecycle
\ $C_treg4 - triggerbefore
\ $C_treg5 - nextcounttosample
\ $C_treg6 - current sample
\
__1
\
\ write out the sample
\
\ t = 12
wrlong $C_treg6 , $C_treg1
\
\ wait for the next sample time
\
\ t = 20 - 35
waitcnt $C_treg5 , $C_treg3
\ t = 26 - 41
\ t = 0
mov $C_treg6 , ina
\ t = 4
add $C_treg1 , # 4
\ t = 8
djnz $C_treg2 , # __1
\ t = 12
\ we are done
jexit
\
;asm _la_asample41+
\ _la_asample18+ ( baseaddr numsamples samplecycle triggerbefore triggerafter
triggermask -- numsamples )
build_BootOpt :rasm
\ trigger mask
mov $C_treg6 , $C_stTOS
spop
\ trigger after
mov $C_treg5 , $C_stTOS
spop
\ trigger before
mov $C_treg4 , $C_stTOS
spop
\ sample cycle
mov $C_treg3 , $C_stTOS
spop
\ num samples
\ mov $C_treg2 , $C_stTOS
spop
\ base address - $C_treg1
mov $C_treg1 , $C_stTOS
mov $C_treg2 , # par
sub $C_treg2 , # __buffer
mov $C_stTOS , $C_treg2
\
\ $C_treg1 - baseaddr
\ $C_treg2 - numsamples
\ $C_treg3 - samplecycle
\ $C_treg4 - triggerbefore
\ $C_treg5 - triggerafter
\ $C_treg6 - triggermask
\
\
\ wait for trigger
\
waitpeq $C_treg4 , $C_treg6
waitpeq $C_treg5 , $C_treg6
\
\ get the sample and set up the count for the next sample
\
\
\ t = 0
mov __buffer , ina
\ t = 4
mov $C_treg5 , cnt
\ t = 8
add $C_treg5 , $C_treg3
\
\ $C_treg1 - baseaddr
\ $C_treg2 - numsamples
\ $C_treg3 - samplecycle
\ $C_treg4 - triggerbefore
\ $C_treg5 - nextcounttosample
\ $C_treg6 - current sample
\
__1
\
\ wait for the next sample time
\
\ t = 12
waitcnt $C_treg5 , $C_treg3
\ t = 18
\ t = 0
__2
mov __buffer1 , ina
\ t = 4
add __2 , $C_fDestInc
\ t = 8
djnz $C_treg2 , # __1
\ t = 12
mov $C_treg2 , $C_stTOS
__3
wrlong __buffer , $C_treg1
add __3 , $C_fDestInc
add $C_treg1 , # 4
djnz $C_treg2 , # __3
\ we are done
jexit
__buffer
0
__buffer1
0
\
;asm _la_asample18+
\ _la_asample4 ( baseaddr numsamples samplecycle triggerbefore triggerafter
triggermask -- numsamples )
build_BootOpt :rasm
\ trigger mask
mov $C_treg6 , $C_stTOS
spop
\ trigger after
mov $C_treg5 , $C_stTOS
spop
\ trigger before
mov $C_treg4 , $C_stTOS
spop
\ sample cycle
spop
\ num samples
spop
\ base address - $C_treg1
mov $C_treg1 , $C_stTOS
mov $C_treg2 , # par
sub $C_treg2 , # 1
movd __3 , $C_treg2
sub $C_treg2 , # __buffer
mov $C_stTOS , $C_treg2
__1
mov __buffer , __inainst
__2
movd __buffer , # __buffer
add __1 , $C_fDestInc
add __2 , $C_fDestInc
add __2 , # 1
djnz $C_treg2 , # __1
movs __jmpinst , # __4
__3
mov 0 , __jmpinst
jmp # __sample
__4
mov $C_treg2 , $C_stTOS
__5
wrlong __buffer , $C_treg1
add __5 , $C_fDestInc
add $C_treg1 , # 4
djnz $C_treg2 , # __5
\ we are done
jexit
__inainst
xA0BC01F2
__jmpinst
x5C7C0000
\
\ wait for trigger
\
__sample
waitpeq $C_treg4 , $C_treg6
waitpeq $C_treg5 , $C_treg6
__buffer
0
\
;asm _la_asample4
\ _la_asample1 ( baseaddr startcount -- numsamples )
build_BootOpt :rasm
\ startcount
mov $C_treg6 , $C_stTOS
spop
\ base address
mov $C_treg1 , $C_stTOS
mov $C_treg2 , # par
sub $C_treg2 , # 1
movd __3 , $C_treg2
sub $C_treg2 , # __buffer
mov $C_stTOS , $C_treg2
__1
mov __buffer , __inainst
__2
movd __buffer , # __buffer
add __1 , $C_fDestInc
add __2 , $C_fDestInc
add __2 , # 1
djnz $C_treg2 , # __1
movs __jmpinst , # __4
__3
mov 0 , __jmpinst
jmp # __sample
__4
mov $C_treg2 , $C_stTOS
__5
wrlong __buffer , $C_treg1
add __5 , $C_fDestInc
add $C_treg1 , # x10
djnz $C_treg2 , # __5
\ we are done
jexit
__inainst
xA0BC01F2
__jmpinst
x5C7C0000
\
\ wait for trigger
\
__sample
waitcnt $C_treg6 , # 0
__buffer
0
\
;asm _la_asample1
Original comment by salsa...@gmail.com
on 12 Oct 2011 at 6:31
Mostly cosmetic stuff:
_la_asample1:
mov $C_treg2 , # par
sub $C_treg2 , # 1
can be combined as well as
add __2 , $C_fDestInc
add __2 , # 1
If you don't have space for a $C_fDestIncSrcInc then make sure carry is set and
use
addx __2 , $C_fDestInc
This can be achieved by adding wc here (__inainst has s[31] set)
__1
mov __buffer , __inainst wc
Some (if not all) of the above also applies to _la_asample4 (instruction
combining).
Original comment by marko.lu...@kyi.biglobe.ne.jp
on 14 Oct 2011 at 2:04
41+ and 18+ look OK.
Original comment by marko.lu...@kyi.biglobe.ne.jp
on 14 Oct 2011 at 2:07
Thanks for the review, will do a cleanup pass for release. fDestInc is used by
the forth core interpreter.
Will do a combine pass, as for 18+ and 41+ every instruction occupies space
that could hold a sample.
Original comment by salsa...@gmail.com
on 14 Oct 2011 at 1:31
SPECS CHANGED after code was written,
This fixed something that would have been lost
Original comment by prof.bra...@gmail.com
on 21 Dec 2011 at 4:55
Original issue reported on code.google.com by
marko.lu...@kyi.biglobe.ne.jp
on 14 Sep 2011 at 4:56